Skip to content

This repository contains supplementary materials for the FSE Demonstrations 2025 paper, "CloudHeatMap: Heatmap-Based Monitoring for Large-Scale Cloud Systems."

Notifications You must be signed in to change notification settings

AMiR-Research/CloudHeatMap

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CloudHeatMap: Visualizing Large-Scale Cloud System Health

arXiv YouTube Demo Python Version

CloudHeatMap is a heatmap-based visualization tool designed to monitor the health and performance of large-scale cloud systems. It visualizes key metrics like call volumes, response times, and HTTP response codes across microservices and data centers. By providing proactive monitoring capabilities, CloudHeatMap helps cloud operators identify and resolve issues before they impact system reliability and user experience.

This work is presented in the FSE 2025 Demonstration Track paper, “CloudHeatMap: Heatmap-Based Monitoring for Large-Scale Cloud Systems.” The preprint is available here. A video demonstration is available on YouTube. For more details, see the M.Sc. thesis by Sarah Sohana (2022).

Features

  • Interactive Heatmaps: Visualize cloud system health and performance metrics in real time.
  • Metrics Filters: Filter data by key metrics such as HTTP status codes, response times, and call volumes.
  • Temporal Analysis: Play heatmap animations over time to track trends and episodic system issues.
  • Scalability: Handle large datasets generated by cloud services.
  • Statistical Aggregation: Incorporate combined mean and standard deviation calculations for aggregated performance insights.

Prerequisites

Ensure you have the following installed:

  • Python 3.7 or higher
  • pip for installing dependencies
  • A virtual environment (optional but recommended)

Project Structure

CloudHeatMap/
├── app.py                        # Main application file
├── Dockerfile                    # For containerization
├── requirements.txt              # Project dependencies
├── data/                         # Directory containing raw data (e.g., .json.gzip files)
├── lib/                          # Helper modules for data processing
│   ├── data_loader.py            # Loads and parses data
│   └── data_processing.py        # Aggregates and processes data for visualizations
└── README.md                     # This README file

Setup Instructions

Cloning the Repository

Clone this repository to your local machine:

git clone https://github.com/sohanasarah/CloudHeatMap.git
cd CloudHeatMap

Running the Application

You can start the app using a virtual environment or a Docker container.

Using a Virtual Environment

  1. Create and activate a virtual environment to avoid installing dependencies globally. For example, using Conda:

    conda create -n test_env python=3.11
    conda activate test_env
  2. Install the required packages using pip:

    pip install -r requirements.txt
  3. Ensure that the raw data (e.g., .json.gzip files) are available in the data/ directory. This data covers 24 hours, captured from an actual system and anonymized. The application will process these files for heatmap visualization.

  4. Run the application:

    python app.py

Using Docker

Alternatively, you can build and run the application using Docker.

  1. Build the Docker image:

    docker build -t cloud-heat-map .
  2. Run the Docker container:

    docker run -it --rm -p 8050:8050 cloud-heat-map

Execution

When you run the application, it will prompt you for the following inputs:

Enter the start hour (0-23):
Enter the end hour (0-23):
Enter time interval (in minutes):

For demonstration purposes, we have included one day of synthetic data (January 1, 2025) in the data folder.

Example input:

Start hour: 10
End hour: 12
Interval: 30

This configuration will analyze data from 10:00 AM to 12:00 PM with 30-minute intervals between each heatmap frame.

After providing the inputs, the Dash server will start. You can view the heatmap visualization by navigating to http://127.0.0.1:8050/ in your web browser.

Usage

  1. Graph Type: Choose between Data Center vs. Services or Caller-Callee Pairs for visualization.
  2. Metrics: Select a metric to visualize (e.g., call volume, response time). The default metric is call volume.
  3. Status Code Filter: Optionally filter by one or more HTTP status codes. By default, all status codes are selected.
  4. Value Range Filter: Set a numeric range to filter the displayed values. By default, no range is applied.
  5. Value Type: Choose between absolute values or percentages (percentages are only available for status codes).
  6. Analyze Heatmaps: Identify hotspots or performance anomalies via the color intensity in the heatmaps.
  7. Animation: Play the animation to observe color changes over time. The first frame shows the total aggregated view.

Data Aggregation and Statistical Calculations

CloudHeatMap calculates combined mean and combined standard deviation to aggregate performance metrics like response times across multiple microservices. This approach is crucial for understanding overall system health, where multiple microservice instances contribute to the aggregate performance.

For more detailed explanations of these calculations and their application in CloudHeatMap, refer to the M.Sc. thesis by Sarah Sohana (2022).

Citation

If you use or study the code, please cite it as follows.

@article{sohana2024cloudheatmap,
  title={CloudHeatMap: Heatmap-Based Monitoring for Large-Scale Cloud Systems},
  author={Sarah Sohana and William Pourmajidi and John Steinbacher and Andriy Miranskyy},
  journal={arXiv preprint arXiv:2410.21092},
  year={2024},
  doi={10.48550/arXiv.2410.21092}
}

If you encounter any issues, please feel free to reach out for support by opening an issue.

About

This repository contains supplementary materials for the FSE Demonstrations 2025 paper, "CloudHeatMap: Heatmap-Based Monitoring for Large-Scale Cloud Systems."

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.3%
  • Dockerfile 1.7%