Ada Boost algorithm in Machine Learning

Palavras-chave:

Publicado em: 04/08/2025

AdaBoost Algorithm in Machine Learning

AdaBoost (Adaptive Boosting) is a powerful ensemble learning algorithm that combines multiple "weak learners" into a single "strong learner". It's particularly effective for classification problems and addresses the challenge of learning from imbalanced data. This article will explore the AdaBoost algorithm, providing a code implementation, explanation, and analysis.

Fundamental Concepts / Prerequisites

To fully understand AdaBoost, familiarity with the following concepts is helpful:

Machine Learning Basics: Understanding the concepts of classification, training data, features, and labels.
Weak Learners: Algorithms that perform slightly better than random guessing. Decision stumps (one-level decision trees) are commonly used as weak learners in AdaBoost.
Ensemble Learning: Combining multiple models to achieve better performance than any single model.
Decision Trees: Basic knowledge of how decision trees work, although AdaBoost often utilizes very simple decision trees.

Core Implementation


import numpy as np

class AdaBoost:
    def __init__(self, n_estimators=50, learning_rate=1.0):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.estimators = []
        self.estimator_weights = []

    def fit(self, X, y):
        n_samples = X.shape[0]
        weights = np.full(n_samples, (1 / n_samples))  # Initialize weights equally

        for _ in range(self.n_estimators):
            # 1. Train a weak learner (decision stump in this example)
            stump = DecisionStump()
            stump.fit(X, y, weights)

            # 2. Calculate the weighted error
            predictions = stump.predict(X)
            error = np.sum(weights * (predictions != y))

            # 3. Calculate the estimator weight (alpha)
            alpha = self.learning_rate * 0.5 * np.log((1 - error) / (error + 1e-10))  # Add a small constant to avoid division by zero

            # 4. Update the weights
            weights *= np.exp(-alpha * y * predictions)
            weights /= np.sum(weights)  # Normalize weights

            # 5. Store the stump and its weight
            self.estimators.append(stump)
            self.estimator_weights.append(alpha)

    def predict(self, X):
        n_samples = X.shape[0]
        predictions = np.zeros(n_samples)

        for i in range(self.n_estimators):
            predictions += self.estimator_weights[i] * self.estimators[i].predict(X)

        return np.sign(predictions)  # Return the sign of the weighted sum

# A simple decision stump class
class DecisionStump:
    def __init__(self):
        self.feature_idx = None
        self.threshold = None
        self.polarity = 1  # Direction of inequality (1 for >, -1 for <)

    def fit(self, X, y, weights):
        n_samples, n_features = X.shape
        best_error = float('inf')

        for feature_idx in range(n_features):
            feature_values = np.unique(X[:, feature_idx])

            for threshold in feature_values:
                # Predict with polarity = 1 (X > threshold)
                predictions = np.ones(n_samples)
                predictions[X[:, feature_idx] > threshold] = -1

                # Calculate error
                error = np.sum(weights * (predictions != y))

                if error < best_error:
                    best_error = error
                    self.feature_idx = feature_idx
                    self.threshold = threshold
                    self.polarity = 1

                # Predict with polarity = -1 (X < threshold)
                predictions = np.ones(n_samples)
                predictions[X[:, feature_idx] < threshold] = -1

                # Calculate error
                error = np.sum(weights * (predictions != y))

                if error < best_error:
                    best_error = error
                    self.feature_idx = feature_idx
                    self.threshold = threshold
                    self.polarity = -1

    def predict(self, X):
        n_samples = X.shape[0]
        predictions = np.ones(n_samples)
        if self.polarity == 1:
            predictions[X[:, self.feature_idx] > self.threshold] = -1
        else:
            predictions[X[:, self.feature_idx] < self.threshold] = -1
        return predictions

Code Explanation

The code implements the AdaBoost algorithm with decision stumps as weak learners. Here's a breakdown:

`AdaBoost` Class:

`__init__`: Initializes the number of estimators (`n_estimators`), the learning rate (`learning_rate`), and empty lists to store the weak learners (`estimators`) and their weights (`estimator_weights`).
`fit(X, y)`: Trains the AdaBoost model.
- It initializes sample weights equally.
- For each estimator:
- A `DecisionStump` is trained on the data with the current sample weights.
- The weighted error of the stump is calculated.
- The estimator's weight (alpha) is calculated based on the error. Lower error results in a higher weight. A small constant is added to the denominator to prevent division by zero.
- Sample weights are updated. Misclassified samples receive higher weights, focusing the next estimator on those difficult cases. Weights are normalized to sum to 1.
- The trained stump and its weight are stored.
`predict(X)`: Predicts the class labels for new data.
- It calculates a weighted sum of the predictions from all the weak learners.
- The sign of the weighted sum is returned as the final prediction.

`DecisionStump` Class:

`__init__`: Initializes the feature index, threshold, and polarity (direction of inequality).
`fit(X, y, weights)`: Trains the decision stump.
- It iterates through each feature and each unique value of the feature as a potential threshold.
- For each feature and threshold, it tries both polarities (X > threshold and X < threshold) to find the one that minimizes the weighted error.
- The feature index, threshold, and polarity that result in the lowest error are stored.
`predict(X)`: Predicts the class labels for new data based on the learned feature, threshold, and polarity.

Complexity Analysis

Time Complexity:

`fit(X, y)`: The main loop iterates `n_estimators` times. Inside the loop, the `DecisionStump.fit()` method is called. The complexity of `DecisionStump.fit()` is O(n_samples * n_features * n_unique_values), where n_unique_values is the number of unique values per feature. The weight update also takes O(n_samples) time. Therefore, the overall time complexity of `fit()` is approximately O(n_estimators * n_samples * n_features * n_unique_values).
`predict(X)`: The prediction loop iterates `n_estimators` times. Each `DecisionStump.predict()` takes O(n_samples) time. Thus, the overall time complexity of `predict()` is O(n_estimators * n_samples).

Space Complexity:

The space complexity is dominated by the storage of the `n_estimators` `DecisionStump` objects and their corresponding weights. Each `DecisionStump` stores the feature index and threshold. Therefore, the space complexity is O(n_estimators).

Alternative Approaches

Gradient Boosting is another popular ensemble learning technique that, like AdaBoost, combines weak learners. However, instead of re-weighting the training data, Gradient Boosting fits new models to the *residuals* of the previous models. This can sometimes lead to better performance, but also requires more careful tuning of hyperparameters like the learning rate to prevent overfitting.

Conclusion

AdaBoost is a powerful and versatile algorithm for classification. Its ability to combine weak learners into a strong learner makes it effective for handling complex datasets and imbalanced classes. Understanding its implementation and complexity allows developers to leverage its benefits effectively in various machine learning applications.