Supermarket Sales Forecasting Using Time Series Analysis

This repository contains a complete end-to-end workflow for forecasting supermarket beverage sales, including beer, wine, and liquor. The project evaluates multiple classical time series models alongside deep learning architectures to determine the most effective approach for retail demand forecasting.

The entire analysis is implemented in a single notebook (project.ipynb) and includes data preprocessing, model training, evaluation, and comparative analysis.

Project Overview

This project builds and compares several forecasting models, including:

Classical Statistical Models:

ARIMA
SARIMAX
Double Exponential Smoothing (DES)
Triple Exponential Smoothing (TES)

Deep Learning Models:

Vanilla LSTM
Bidirectional LSTM
Stacked LSTM

Each model is evaluated against real retail beverage sales data to assess predictive performance and practical applicability.

Dataset

The dataset used in this project is sourced from:

Warehouse and Retail Sales – Data.gov
https://catalog.data.gov/dataset/warehouse-and-retail-sales/resource/b1ecfb86-7b0a-4619-a263-ffbc37f4e3e0

The dataset provides monthly sales values for various product categories.
This project focuses specifically on beverage categories and converts the dataset into a clean, time-indexed format suitable for forecasting.

Features of the Project

Data cleaning and preprocessing
Exploratory time series analysis
Building and training statistical forecasting models
Designing and training deep learning LSTM-based models
Performance evaluation using MSE, RMSE, and R²
Visual comparison of forecasts versus actual sales
Results interpretation and insights

Installation and Requirements

All required packages can be installed as follows:

Required Libraries:

numpy
pandas
matplotlib
scikit-learn
statsmodels
tensorflow
keras
notebook

Recommended environment:

Google Colab
VS Code with Python virtual environment
Any machine with GPU support for faster LSTM training

How to Run the Notebook

Download the dataset from the link above.
Place the CSV file in the project directory.
Open the notebook:
Run all cells in order to:
- Load and clean the dataset
- Train ARIMA, SARIMAX, DES, TES models
- Train LSTM, Bidirectional LSTM, and Stacked LSTM models
- Evaluate forecasting performance

Methodology

1. Data Preprocessing

Selection of relevant beverage categories
Combining month and year fields into a single DateTime column
Creating a proper time-series index
Splitting data into training and testing sets

2. Model Training

Each model is trained using the preprocessed dataset.
Hyperparameters are chosen based on time series diagnostics and experimentation.

3. Evaluation Metrics

Models are evaluated using:

Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
R² Score

4. Comparative Analysis

Plots and metrics allow straightforward comparison of:

Classical vs. Deep Learning methods
Speed vs. accuracy
Short-term vs. long-term forecasting ability

Results Summary

Best Model:

SARIMAX, showing the most stable and accurate performance across all categories

General Observations:

DES and TES performed poorly due to data characteristics
Stacked LSTM and SARIMAX produced strong forecasts
Deep learning models required significantly longer training time
Limited dataset size restricts long-term prediction accuracy

Future Improvements

Use larger, more diverse supermarket datasets
Add external variables such as promotions, holidays, or inflation
Build hybrid statistical + neural network models
Apply automated hyperparameter tuning (Optuna, grid search, Bayesian optimization)
Improve generalization with more data preprocessing and augmentation

Limitations

Dataset does not contain large historical volume
Sales patterns vary widely across categories
Some models are sensitive to strong trends and limited observations
Real-world deployment would require additional feature engineering

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
README.md		README.md
project.ipynb		project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supermarket Sales Forecasting Using Time Series Analysis

Project Overview

Dataset

Features of the Project

Installation and Requirements

How to Run the Notebook

Methodology

1. Data Preprocessing

2. Model Training

3. Evaluation Metrics

4. Comparative Analysis

Results Summary

Future Improvements

Limitations

About

Uh oh!

Releases

Packages

Languages

OmkarSawant23/Major_Project

Folders and files

Latest commit

History

Repository files navigation

Supermarket Sales Forecasting Using Time Series Analysis

Project Overview

Dataset

Features of the Project

Installation and Requirements

How to Run the Notebook

Methodology

1. Data Preprocessing

2. Model Training

3. Evaluation Metrics

4. Comparative Analysis

Results Summary

Future Improvements

Limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages