Skip to content

A time series forecasting project comparing ARIMA, SARIMAX, LSTM, DES, and TES models to predict supermarket beverage sales (beer, wine, liquor). Includes data preprocessing, model training, evaluation, and performance comparison, with SARIMAX achieving the best results.

Notifications You must be signed in to change notification settings

OmkarSawant23/Major_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 

Repository files navigation

Supermarket Sales Forecasting Using Time Series Analysis

This repository contains a complete end-to-end workflow for forecasting supermarket beverage sales, including beer, wine, and liquor. The project evaluates multiple classical time series models alongside deep learning architectures to determine the most effective approach for retail demand forecasting.

The entire analysis is implemented in a single notebook (project.ipynb) and includes data preprocessing, model training, evaluation, and comparative analysis.


Project Overview

This project builds and compares several forecasting models, including:

Classical Statistical Models:

  • ARIMA
  • SARIMAX
  • Double Exponential Smoothing (DES)
  • Triple Exponential Smoothing (TES)

Deep Learning Models:

  • Vanilla LSTM
  • Bidirectional LSTM
  • Stacked LSTM

Each model is evaluated against real retail beverage sales data to assess predictive performance and practical applicability.


Dataset

The dataset used in this project is sourced from:

Warehouse and Retail Sales – Data.gov
https://catalog.data.gov/dataset/warehouse-and-retail-sales/resource/b1ecfb86-7b0a-4619-a263-ffbc37f4e3e0

The dataset provides monthly sales values for various product categories.
This project focuses specifically on beverage categories and converts the dataset into a clean, time-indexed format suitable for forecasting.


Features of the Project

  • Data cleaning and preprocessing
  • Exploratory time series analysis
  • Building and training statistical forecasting models
  • Designing and training deep learning LSTM-based models
  • Performance evaluation using MSE, RMSE, and R²
  • Visual comparison of forecasts versus actual sales
  • Results interpretation and insights

Installation and Requirements

All required packages can be installed as follows:

Required Libraries:

  • numpy
  • pandas
  • matplotlib
  • scikit-learn
  • statsmodels
  • tensorflow
  • keras
  • notebook

Recommended environment:

  • Google Colab
  • VS Code with Python virtual environment
  • Any machine with GPU support for faster LSTM training

How to Run the Notebook

  1. Download the dataset from the link above.

  2. Place the CSV file in the project directory.

  3. Open the notebook:

  4. Run all cells in order to:

    • Load and clean the dataset
    • Train ARIMA, SARIMAX, DES, TES models
    • Train LSTM, Bidirectional LSTM, and Stacked LSTM models
    • Evaluate forecasting performance

Methodology

1. Data Preprocessing

  • Selection of relevant beverage categories
  • Combining month and year fields into a single DateTime column
  • Creating a proper time-series index
  • Splitting data into training and testing sets

2. Model Training

Each model is trained using the preprocessed dataset.
Hyperparameters are chosen based on time series diagnostics and experimentation.

3. Evaluation Metrics

Models are evaluated using:

  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE)
  • R² Score

4. Comparative Analysis

Plots and metrics allow straightforward comparison of:

  • Classical vs. Deep Learning methods
  • Speed vs. accuracy
  • Short-term vs. long-term forecasting ability

Results Summary

Best Model:

  • SARIMAX, showing the most stable and accurate performance across all categories

General Observations:

  • DES and TES performed poorly due to data characteristics
  • Stacked LSTM and SARIMAX produced strong forecasts
  • Deep learning models required significantly longer training time
  • Limited dataset size restricts long-term prediction accuracy

Future Improvements

  • Use larger, more diverse supermarket datasets
  • Add external variables such as promotions, holidays, or inflation
  • Build hybrid statistical + neural network models
  • Apply automated hyperparameter tuning (Optuna, grid search, Bayesian optimization)
  • Improve generalization with more data preprocessing and augmentation

Limitations

  • Dataset does not contain large historical volume
  • Sales patterns vary widely across categories
  • Some models are sensitive to strong trends and limited observations
  • Real-world deployment would require additional feature engineering

About

A time series forecasting project comparing ARIMA, SARIMAX, LSTM, DES, and TES models to predict supermarket beverage sales (beer, wine, liquor). Includes data preprocessing, model training, evaluation, and performance comparison, with SARIMAX achieving the best results.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published