Stationarity Tests in Time Series

Palavras-chave:

Publicado em: 04/08/2025

Stationarity Tests in Time Series

Stationarity is a crucial concept in time series analysis. A stationary time series has statistical properties like mean and variance that remain constant over time. This article will explore methods to test for stationarity, focusing on the Augmented Dickey-Fuller (ADF) test, a widely used statistical test to determine if a time series is stationary.

Fundamental Concepts / Prerequisites

Before diving into stationarity tests, it's essential to understand these concepts:

* **Time Series:** A sequence of data points indexed in time order. * **Stationarity:** A time series is stationary if its statistical properties (mean, variance, autocorrelation) do not change over time. Weak stationarity requires constant mean and autocovariance that depends only on the lag. Strong stationarity requires the joint distribution of any set of values to be independent of time shifts. * **Autocorrelation:** The correlation between a time series and a lagged version of itself. * **Unit Root:** A characteristic of non-stationary time series, indicating that a shock has a permanent effect on the series. Removing a unit root can make a time series stationary.

Implementation: Augmented Dickey-Fuller (ADF) Test in Python


import pandas as pd
from statsmodels.tsa.stattools import adfuller

def adf_test(series, significance_level=0.05):
    """
    Performs the Augmented Dickey-Fuller test on a time series.

    Args:
        series (pd.Series): The time series data.
        significance_level (float, optional): The significance level for the test. Defaults to 0.05.

    Returns:
        tuple: A tuple containing:
            - adf_statistic (float): The ADF test statistic.
            - p_value (float): The p-value of the test.
            - critical_values (dict): The critical values for different significance levels.
            - is_stationary (bool): True if the time series is stationary, False otherwise.
    """
    result = adfuller(series, autolag='AIC')  # Using AIC to determine optimal lag

    adf_statistic = result[0]
    p_value = result[1]
    critical_values = result[4]

    is_stationary = p_value <= significance_level

    return adf_statistic, p_value, critical_values, is_stationary


if __name__ == '__main__':
    # Example usage
    data = [10, 12, 11, 13, 12, 14, 15, 16, 15, 17, 18, 19, 20, 19, 18, 21, 22, 23, 24, 25]
    series = pd.Series(data)

    adf_statistic, p_value, critical_values, is_stationary = adf_test(series)

    print("ADF Statistic:", adf_statistic)
    print("p-value:", p_value)
    print("Critical Values:", critical_values)
    print("Is Stationary:", is_stationary) # Print the stationarity result

    if is_stationary:
        print("The time series is likely stationary.")
    else:
        print("The time series is likely non-stationary.")

Code Explanation

The code implements the ADF test using the `statsmodels` library in Python. Here's a breakdown:

1. **Import Libraries:** Imports `pandas` for handling time series data and `statsmodels.tsa.stattools.adfuller` for the ADF test.

2. **`adf_test(series, significance_level)` function:** This function encapsulates the ADF test. It takes the time series data (`series`) and an optional significance level as input.

3. **`adfuller(series, autolag='AIC')`:** This line performs the ADF test. The `autolag='AIC'` argument automatically selects the optimal lag order using the Akaike Information Criterion (AIC), which helps to optimize the test's performance. The result is a tuple containing various statistics.

4. **Extract Results:** The code extracts the ADF test statistic, p-value, and critical values from the `result` tuple.

5. **Determine Stationarity:** The `is_stationary` variable is set to `True` if the p-value is less than or equal to the significance level (typically 0.05). This means we reject the null hypothesis that the time series has a unit root (is non-stationary) and conclude that it's likely stationary.

6. **Return Results:** The function returns the ADF statistic, p-value, critical values, and the `is_stationary` boolean.

7. **Example Usage:** The `if __name__ == '__main__':` block demonstrates how to use the `adf_test` function. It creates a sample time series using a list, converts it to a `pandas` Series, calls the `adf_test` function, and prints the results, including a statement about whether the time series is stationary or not based on the p-value.

Complexity Analysis

The Augmented Dickey-Fuller (ADF) test involves fitting an autoregressive (AR) model to the time series data. Therefore, the complexity is heavily influenced by the length of the time series and the lag order selected for the AR model.

* **Time Complexity:** The dominant factor is the autoregression fitting process. If `n` is the length of the time series and `k` is the number of lags, the time complexity is roughly O(n*k^2) due to matrix operations during the least squares estimation within the ADF test. The lag order k can be optimized using methods like AIC, which adds additional computations, but generally does not change the overall order of complexity. * **Space Complexity:** The space complexity is primarily determined by the storage of the time series data and the AR model coefficients. It is approximately O(n + k), where 'n' is the length of the series and 'k' is the lag order.

Alternative Approaches

Besides the ADF test, other methods can assess stationarity:

* **Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test:** Unlike the ADF test, which tests the null hypothesis that a time series has a unit root (is non-stationary), the KPSS test tests the null hypothesis that the time series is stationary. This provides a complementary perspective. The ADF and KPSS tests can give conflicting answers, which indicates the series may be neither strictly stationary nor integrated (i.e. non-stationary and requiring differencing). The trade-off is that KPSS can sometimes be less sensitive to short-term fluctuations compared to ADF.

Conclusion

Testing for stationarity is a vital step in time series analysis. The Augmented Dickey-Fuller (ADF) test is a common method for this purpose. Understanding its implementation, interpretation, and complexity helps developers build robust time series models. Remember that stationarity is often a prerequisite for many time series forecasting techniques.