Difference Between a Batch and an Epoch in a Neural Network

Palavras-chave:

Publicado em: 04/08/2025

Understanding Batches and Epochs in Neural Networks

In machine learning, particularly when training neural networks, understanding the concepts of 'batch' and 'epoch' is crucial. This article clarifies the differences between these terms, which are fundamental for controlling the training process and optimizing model performance.

Fundamental Concepts / Prerequisites

Before diving into batches and epochs, it's important to understand a few foundational concepts:

Neural Network: A computational model inspired by the structure and function of biological neural networks.
Training Data: A dataset used to train the neural network. It consists of input features and corresponding target values (labels).
Iteration: A single update of the model's parameters using a batch of data.
Loss Function: A function that quantifies the difference between the model's predictions and the actual target values. The goal of training is to minimize this loss.
Gradient Descent: An optimization algorithm used to update the model's parameters in the direction that reduces the loss function.

Batches and Epochs Defined

In the context of neural networks, these are related but distinct concepts.

Batch: A subset of the training dataset used in one iteration of the training process. The model processes the batch, calculates the loss, and updates its weights based on the gradients computed from that batch.
Epoch: One complete pass through the entire training dataset. In each epoch, the training data is divided into batches, and the model is trained on each batch sequentially.

Why Use Batches?

Processing the entire dataset at once (batch gradient descent) can be computationally expensive and may not fit into memory for large datasets. Stochastic gradient descent (SGD) updates the model after each data point, which can be noisy and lead to unstable convergence. Mini-batch gradient descent (using batches) strikes a balance between these two extremes, providing a smoother learning process and computational efficiency.

Example Scenario

Consider a dataset with 1000 training examples. If you set the batch size to 100, then each epoch will consist of 10 iterations. After each batch of 100 examples, the model's weights are updated. After 10 iterations (processing all 1000 examples), one epoch is completed.

Code Example


import numpy as np

# Simulate a training dataset
X = np.random.rand(1000, 10)  # 1000 samples, 10 features each
y = np.random.randint(0, 2, 1000)  # Binary classification (0 or 1)

# Hyperparameters
batch_size = 32
epochs = 10
learning_rate = 0.01

# Simple model (linear model for demonstration)
weights = np.random.rand(10)  # Initialize weights randomly
bias = 0.0

# Sigmoid activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Training loop
for epoch in range(epochs):
    for i in range(0, len(X), batch_size):
        # Get the batch
        X_batch = X[i:i+batch_size]
        y_batch = y[i:i+batch_size]

        # Calculate predictions
        z = np.dot(X_batch, weights) + bias
        predictions = sigmoid(z)

        # Calculate the loss (simplified example)
        loss = np.mean((predictions - y_batch)**2)

        # Calculate gradients (simplified example)
        dw = np.dot(X_batch.T, (predictions - y_batch)) / len(X_batch)
        db = np.mean(predictions - y_batch)

        # Update weights and bias
        weights -= learning_rate * dw
        bias -= learning_rate * db

    print(f"Epoch {epoch+1}, Loss: {loss}")

Code Explanation

The Python code above demonstrates a simplified training loop using batches and epochs.

Data Simulation: `X = np.random.rand(1000, 10)` creates a synthetic dataset with 1000 samples and 10 features each. `y = np.random.randint(0, 2, 1000)` creates corresponding binary labels (0 or 1).
Hyperparameter Definition: `batch_size = 32` sets the number of samples in each batch. `epochs = 10` sets the number of times the training loop will iterate over the entire dataset. `learning_rate = 0.01` controls the step size for updating the model's weights.
Model Initialization: `weights = np.random.rand(10)` and `bias = 0.0` initialize the model's parameters randomly.
Training Loop (Epochs): The outer loop `for epoch in range(epochs):` iterates over each epoch.
Batch Iteration: The inner loop `for i in range(0, len(X), batch_size):` iterates through the dataset in batches.
Batch Extraction: `X_batch = X[i:i+batch_size]` and `y_batch = y[i:i+batch_size]` extract the current batch of data and labels.
Forward Pass: `z = np.dot(X_batch, weights) + bias` calculates the linear output of the model. `predictions = sigmoid(z)` applies the sigmoid activation function to get the predicted probabilities.
Loss Calculation: `loss = np.mean((predictions - y_batch)**2)` calculates the mean squared error between the predictions and the actual labels. This is a simplified loss function for demonstration.
Gradient Calculation: `dw = np.dot(X_batch.T, (predictions - y_batch)) / len(X_batch)` and `db = np.mean(predictions - y_batch)` calculate the gradients of the loss function with respect to the weights and bias (using a simplified gradient calculation).
Parameter Update: `weights -= learning_rate * dw` and `bias -= learning_rate * db` update the model's weights and bias using gradient descent.
Loss Reporting: `print(f"Epoch {epoch+1}, Loss: {loss}")` prints the loss after each epoch to monitor the training progress.

Complexity Analysis

Let's analyze the complexity of the training process, assuming `N` is the total number of training samples, `B` is the batch size, and `E` is the number of epochs.

* **Time Complexity:** The outer loop runs for `E` epochs. The inner loop runs for `N/B` iterations. The calculations within each iteration (forward pass, loss calculation, gradient calculation, parameter update) have a complexity that depends on the model architecture but is constant for a fixed model. Therefore, the overall time complexity is approximately O(E * N/B * C), where C is the complexity of computations within each batch. This can be simplified to O(E * N) when C is relatively constant, showing that time complexity is proportional to number of epochs and number of examples. * **Space Complexity:** The dominant space complexity is determined by the storage of the training data (O(N)) and the model parameters. The batch size `B` affects the memory required to process each batch, but typically B << N. Therefore the space complexity is often considered to be O(N) or O(M), where M is size of model parameters, whichever is larger.

Alternative Approaches

Instead of using a fixed batch size, one alternative is to use *adaptive batch sizing*. This involves dynamically adjusting the batch size during training based on various factors like the magnitude of the gradients or the training progress. A smaller batch size can lead to faster initial progress but might be noisier. A larger batch size can provide a more stable gradient estimate but might slow down convergence. Adaptive batch sizing aims to balance these trade-offs. However, implementing adaptive batch sizing can add complexity to the training process.

Conclusion

Understanding the difference between batches and epochs is crucial for effectively training neural networks. A batch represents a subset of the training data used in one iteration, while an epoch represents one complete pass through the entire training dataset. By carefully selecting the batch size and the number of epochs, you can optimize the training process for better model performance and faster convergence.