L1 and L2 Regularization

Palavras-chave:

Publicado em: 04/08/2025

L1 and L2 Regularization: A Deep Dive

Regularization is a crucial technique in machine learning to prevent overfitting, where a model performs exceptionally well on training data but poorly on unseen data. This article provides a comprehensive overview of L1 and L2 regularization, two of the most commonly used regularization methods.

Fundamental Concepts / Prerequisites

To fully grasp L1 and L2 regularization, you should have a basic understanding of the following:

Linear Regression: Knowing how linear regression models work is crucial.
Cost Functions: Familiarity with cost functions like Mean Squared Error (MSE).
Overfitting and Underfitting: Understanding the concepts of overfitting (high variance) and underfitting (high bias).

Core Implementation/Solution

This example demonstrates how L1 and L2 regularization can be applied within the context of a linear regression model using Python and the NumPy library. We define functions to calculate the regularized cost functions.


import numpy as np

def mse_cost(y_true, y_predicted):
  """Calculates the Mean Squared Error (MSE) cost."""
  return np.mean((y_true - y_predicted)**2)

def l1_regularization(weights, lambda_val):
  """Calculates the L1 regularization term."""
  return lambda_val * np.sum(np.abs(weights))

def l2_regularization(weights, lambda_val):
  """Calculates the L2 regularization term."""
  return lambda_val * np.sum(weights**2)

def l1_regularized_cost(y_true, y_predicted, weights, lambda_val):
  """Calculates the L1 regularized cost function."""
  mse = mse_cost(y_true, y_predicted)
  l1 = l1_regularization(weights, lambda_val)
  return mse + l1

def l2_regularized_cost(y_true, y_predicted, weights, lambda_val):
  """Calculates the L2 regularized cost function."""
  mse = mse_cost(y_true, y_predicted)
  l2 = l2_regularization(weights, lambda_val)
  return mse + l2

# Example usage
y_true = np.array([1, 2, 3, 4, 5])
y_predicted = np.array([1.1, 1.9, 3.2, 3.9, 5.1])
weights = np.array([0.5, 0.2, 0.7, 0.1, 0.3])
lambda_val = 0.1 # Regularization strength

l1_cost = l1_regularized_cost(y_true, y_predicted, weights, lambda_val)
l2_cost = l2_regularized_cost(y_true, y_predicted, weights, lambda_val)

print(f"L1 Regularized Cost: {l1_cost}")
print(f"L2 Regularized Cost: {l2_cost}")

Code Explanation

The Python code defines several functions related to L1 and L2 regularization within a linear regression context.

`mse_cost(y_true, y_predicted)`: This function calculates the Mean Squared Error (MSE) between the true values (`y_true`) and the predicted values (`y_predicted`). MSE is a common metric used to evaluate the performance of regression models.

`l1_regularization(weights, lambda_val)`: This function calculates the L1 regularization term. L1 regularization adds a penalty proportional to the absolute value of the weights of the model. `lambda_val` controls the strength of the regularization.

`l2_regularization(weights, lambda_val)`: This function calculates the L2 regularization term. L2 regularization adds a penalty proportional to the square of the weights of the model. `lambda_val` also controls the strength of the regularization here.

`l1_regularized_cost(y_true, y_predicted, weights, lambda_val)`: This function calculates the total cost, including both the MSE and the L1 regularization term. This is the overall cost that the model will try to minimize during training.

`l2_regularized_cost(y_true, y_predicted, weights, lambda_val)`: This function calculates the total cost, including both the MSE and the L2 regularization term, analogous to the L1 version.

The code then provides an example of how to use these functions with sample data, demonstrating how to calculate the L1 and L2 regularized costs.

Complexity Analysis

The provided code primarily consists of element-wise operations on NumPy arrays, along with the calculation of sums.

Time Complexity: The time complexity of both `l1_regularization` and `l2_regularization` is O(n), where n is the number of weights, as they involve iterating through the weight vector to calculate the sum of absolute values (L1) or sum of squared values (L2). The `mse_cost` function also has a time complexity of O(n), where n is the number of predictions. Therefore, `l1_regularized_cost` and `l2_regularized_cost` also have a time complexity of O(n).
Space Complexity: The space complexity is O(1) for all functions, as they primarily use a fixed number of variables to store intermediate results. The input arrays occupy O(n) space, but their size is determined by the input data, not by the regularization functions themselves.

Alternative Approaches

Instead of explicitly calculating the regularization terms and adding them to the cost function, many machine learning libraries (e.g., scikit-learn, TensorFlow, PyTorch) provide built-in support for L1 and L2 regularization directly within their model training routines. This often involves specifying a regularization parameter (like `alpha` in scikit-learn) during model initialization. This approach is often more efficient and easier to implement, as the library handles the details of incorporating the regularization into the optimization process.

For example, in scikit-learn, you can use the `Ridge` class for L2 regularization and the `Lasso` class for L1 regularization. These classes handle the regularization internally during the training process, so you don't need to explicitly calculate the regularization terms yourself.

Conclusion

L1 and L2 regularization are powerful techniques for preventing overfitting in machine learning models. L1 regularization can lead to sparse models by driving some weights to zero, effectively performing feature selection. L2 regularization, on the other hand, tends to shrink all weights towards zero but rarely makes them exactly zero. Understanding the mathematical foundations and practical implementation of these techniques is essential for building robust and generalizable machine learning models.