Worldscope

Maximum Likelihood Estimation

Palavras-chave:

Publicado em: 05/08/2025

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution given observed data. The goal is to find the parameters that maximize the likelihood function, which represents the probability of observing the data given the assumed distribution and its parameters. This article provides a practical understanding of MLE with a Python implementation.

Fundamental Concepts / Prerequisites

Before diving into the implementation, it's important to understand the following concepts:

  • Probability Distribution: A function that describes the probability of different outcomes for a random variable. Examples include the normal distribution, binomial distribution, and Poisson distribution.
  • Likelihood Function: The likelihood function, L(θ | x), is the probability of observing the data (x) given the parameters (θ) of the probability distribution. It is treated as a function of θ with x fixed. Mathematically, L(θ | x) = P(x | θ).
  • Log-Likelihood: Since the product of many small probabilities can lead to numerical underflow, we often work with the log-likelihood, which is the logarithm of the likelihood function. Maximizing the log-likelihood is equivalent to maximizing the likelihood.
  • Optimization: Finding the values of the parameters that maximize the (log-)likelihood function. This often involves calculus (finding derivatives and setting them to zero) or numerical optimization techniques.

Implementation in Python

This example demonstrates MLE for estimating the mean (μ) and standard deviation (σ) of a normal distribution, given a set of observed data.


import numpy as np
from scipy.optimize import minimize
from scipy.stats import norm

def negative_log_likelihood(params, data):
    """
    Calculates the negative log-likelihood of the normal distribution.

    Args:
        params (tuple): A tuple containing the mean (mu) and standard deviation (sigma).
        data (np.ndarray): The observed data.

    Returns:
        float: The negative log-likelihood.
    """
    mu, sigma = params
    if sigma <= 0:  # Constraint: standard deviation must be positive
        return float('inf')  # Return a large value to discourage non-positive sigma

    log_likelihood = np.sum(norm.logpdf(data, loc=mu, scale=sigma))
    return -log_likelihood  # We return the negative to minimize it using optimization algorithms


def estimate_gaussian_parameters(data):
    """
    Estimates the mean and standard deviation of a normal distribution using MLE.

    Args:
        data (np.ndarray): The observed data.

    Returns:
        tuple: A tuple containing the estimated mean and standard deviation.
    """

    # Initial guess for parameters
    initial_guess = (np.mean(data), np.std(data))

    # Optimization using the Nelder-Mead method (or another suitable method)
    result = minimize(negative_log_likelihood, initial_guess, args=(data,), method='Nelder-Mead')

    # Extract the estimated parameters
    estimated_mu, estimated_sigma = result.x

    return estimated_mu, estimated_sigma


if __name__ == '__main__':
    # Example usage:
    data = np.array([2.5, 3.1, 1.8, 2.2, 2.7, 3.5, 2.9, 2.0, 2.4, 3.0])

    estimated_mu, estimated_sigma = estimate_gaussian_parameters(data)

    print(f"Estimated Mean (mu): {estimated_mu:.4f}")
    print(f"Estimated Standard Deviation (sigma): {estimated_sigma:.4f}")

Code Explanation

`negative_log_likelihood(params, data)`: This function calculates the negative log-likelihood of the normal distribution given the parameters (mean and standard deviation) and the observed data. The `norm.logpdf` function from `scipy.stats` computes the log of the probability density function (PDF) of the normal distribution for each data point. The sum of these log probabilities represents the log-likelihood. We return the negative log-likelihood because optimization algorithms typically minimize functions.

`estimate_gaussian_parameters(data)`: This function implements the MLE algorithm. It first provides an initial guess for the parameters (mean and standard deviation) using the sample mean and sample standard deviation of the data. Then, it uses the `minimize` function from `scipy.optimize` to find the parameters that minimize the negative log-likelihood. The `Nelder-Mead` method is a common optimization algorithm that doesn't require gradient information. The `args` argument passes the data to the `negative_log_likelihood` function during optimization. Finally, it extracts the estimated mean and standard deviation from the optimization result.

`if __name__ == '__main__':` block: This section demonstrates how to use the `estimate_gaussian_parameters` function. It creates sample data, calls the function to estimate the parameters, and then prints the estimated values.

Complexity Analysis

Time Complexity: The time complexity of MLE depends heavily on the optimization algorithm used. The `minimize` function with the `Nelder-Mead` method typically has a time complexity that is super-linear but less than quadratic in the number of parameters. The `negative_log_likelihood` function has a time complexity of O(n), where n is the number of data points, because it iterates through the data to calculate the log-likelihood. The overall time complexity is therefore determined by the complexity of the optimization algorithm and the number of iterations required for convergence, multiplied by O(n) for each likelihood evaluation.

Space Complexity: The space complexity is O(n) due to storing the input data array. The `negative_log_likelihood` function uses constant space. The `minimize` function uses space depending on the optimization algorithm (Nelder-Mead uses O(1)). So, the overall space complexity is O(n).

Alternative Approaches

An alternative approach is to use gradient-based optimization methods like gradient descent. These methods require calculating the gradient of the log-likelihood function. This can be done analytically (if the derivative is known) or numerically. Gradient-based methods can converge faster than Nelder-Mead, especially for high-dimensional parameter spaces, but they require careful selection of the learning rate and can get stuck in local minima. Another alternative is to use specialized optimization packages for MLE, which may offer better performance and stability for specific distributions.

Conclusion

Maximum Likelihood Estimation (MLE) is a powerful technique for estimating the parameters of a probability distribution. This article provided a hands-on example of implementing MLE for a normal distribution in Python. While the example focused on the normal distribution, the same principles can be applied to other distributions by modifying the likelihood function. Understanding the underlying assumptions and potential limitations of MLE is crucial for its effective application in machine learning and other fields.