Ada Boost algorithm in Machine Learning
Palavras-chave:
Publicado em: 04/08/2025AdaBoost Algorithm in Machine Learning
AdaBoost (Adaptive Boosting) is a powerful ensemble learning algorithm that combines multiple "weak learners" into a single "strong learner". It's particularly effective for classification problems and addresses the challenge of learning from imbalanced data. This article will explore the AdaBoost algorithm, providing a code implementation, explanation, and analysis.
Fundamental Concepts / Prerequisites
To fully understand AdaBoost, familiarity with the following concepts is helpful:
- Machine Learning Basics: Understanding the concepts of classification, training data, features, and labels.
- Weak Learners: Algorithms that perform slightly better than random guessing. Decision stumps (one-level decision trees) are commonly used as weak learners in AdaBoost.
- Ensemble Learning: Combining multiple models to achieve better performance than any single model.
- Decision Trees: Basic knowledge of how decision trees work, although AdaBoost often utilizes very simple decision trees.
Core Implementation
import numpy as np
class AdaBoost:
def __init__(self, n_estimators=50, learning_rate=1.0):
self.n_estimators = n_estimators
self.learning_rate = learning_rate
self.estimators = []
self.estimator_weights = []
def fit(self, X, y):
n_samples = X.shape[0]
weights = np.full(n_samples, (1 / n_samples)) # Initialize weights equally
for _ in range(self.n_estimators):
# 1. Train a weak learner (decision stump in this example)
stump = DecisionStump()
stump.fit(X, y, weights)
# 2. Calculate the weighted error
predictions = stump.predict(X)
error = np.sum(weights * (predictions != y))
# 3. Calculate the estimator weight (alpha)
alpha = self.learning_rate * 0.5 * np.log((1 - error) / (error + 1e-10)) # Add a small constant to avoid division by zero
# 4. Update the weights
weights *= np.exp(-alpha * y * predictions)
weights /= np.sum(weights) # Normalize weights
# 5. Store the stump and its weight
self.estimators.append(stump)
self.estimator_weights.append(alpha)
def predict(self, X):
n_samples = X.shape[0]
predictions = np.zeros(n_samples)
for i in range(self.n_estimators):
predictions += self.estimator_weights[i] * self.estimators[i].predict(X)
return np.sign(predictions) # Return the sign of the weighted sum
# A simple decision stump class
class DecisionStump:
def __init__(self):
self.feature_idx = None
self.threshold = None
self.polarity = 1 # Direction of inequality (1 for >, -1 for <)
def fit(self, X, y, weights):
n_samples, n_features = X.shape
best_error = float('inf')
for feature_idx in range(n_features):
feature_values = np.unique(X[:, feature_idx])
for threshold in feature_values:
# Predict with polarity = 1 (X > threshold)
predictions = np.ones(n_samples)
predictions[X[:, feature_idx] > threshold] = -1
# Calculate error
error = np.sum(weights * (predictions != y))
if error < best_error:
best_error = error
self.feature_idx = feature_idx
self.threshold = threshold
self.polarity = 1
# Predict with polarity = -1 (X < threshold)
predictions = np.ones(n_samples)
predictions[X[:, feature_idx] < threshold] = -1
# Calculate error
error = np.sum(weights * (predictions != y))
if error < best_error:
best_error = error
self.feature_idx = feature_idx
self.threshold = threshold
self.polarity = -1
def predict(self, X):
n_samples = X.shape[0]
predictions = np.ones(n_samples)
if self.polarity == 1:
predictions[X[:, self.feature_idx] > self.threshold] = -1
else:
predictions[X[:, self.feature_idx] < self.threshold] = -1
return predictions
Code Explanation
The code implements the AdaBoost algorithm with decision stumps as weak learners. Here's a breakdown:
`AdaBoost` Class:
- `__init__`: Initializes the number of estimators (`n_estimators`), the learning rate (`learning_rate`), and empty lists to store the weak learners (`estimators`) and their weights (`estimator_weights`).
- `fit(X, y)`: Trains the AdaBoost model.
- It initializes sample weights equally.
- For each estimator:
- A `DecisionStump` is trained on the data with the current sample weights.
- The weighted error of the stump is calculated.
- The estimator's weight (alpha) is calculated based on the error. Lower error results in a higher weight. A small constant is added to the denominator to prevent division by zero.
- Sample weights are updated. Misclassified samples receive higher weights, focusing the next estimator on those difficult cases. Weights are normalized to sum to 1.
- The trained stump and its weight are stored.
- `predict(X)`: Predicts the class labels for new data.
- It calculates a weighted sum of the predictions from all the weak learners.
- The sign of the weighted sum is returned as the final prediction.
`DecisionStump` Class:
- `__init__`: Initializes the feature index, threshold, and polarity (direction of inequality).
- `fit(X, y, weights)`: Trains the decision stump.
- It iterates through each feature and each unique value of the feature as a potential threshold.
- For each feature and threshold, it tries both polarities (X > threshold and X < threshold) to find the one that minimizes the weighted error.
- The feature index, threshold, and polarity that result in the lowest error are stored.
- `predict(X)`: Predicts the class labels for new data based on the learned feature, threshold, and polarity.
Complexity Analysis
Time Complexity:
- `fit(X, y)`: The main loop iterates `n_estimators` times. Inside the loop, the `DecisionStump.fit()` method is called. The complexity of `DecisionStump.fit()` is O(n_samples * n_features * n_unique_values), where n_unique_values is the number of unique values per feature. The weight update also takes O(n_samples) time. Therefore, the overall time complexity of `fit()` is approximately O(n_estimators * n_samples * n_features * n_unique_values).
- `predict(X)`: The prediction loop iterates `n_estimators` times. Each `DecisionStump.predict()` takes O(n_samples) time. Thus, the overall time complexity of `predict()` is O(n_estimators * n_samples).
Space Complexity:
- The space complexity is dominated by the storage of the `n_estimators` `DecisionStump` objects and their corresponding weights. Each `DecisionStump` stores the feature index and threshold. Therefore, the space complexity is O(n_estimators).
Alternative Approaches
Gradient Boosting is another popular ensemble learning technique that, like AdaBoost, combines weak learners. However, instead of re-weighting the training data, Gradient Boosting fits new models to the *residuals* of the previous models. This can sometimes lead to better performance, but also requires more careful tuning of hyperparameters like the learning rate to prevent overfitting.
Conclusion
AdaBoost is a powerful and versatile algorithm for classification. Its ability to combine weak learners into a strong learner makes it effective for handling complex datasets and imbalanced classes. Understanding its implementation and complexity allows developers to leverage its benefits effectively in various machine learning applications.