Skip to content

A lightweight Python tool that converts PyTorch Profiler traces (.json) into clean, structured CSV datasets. It intelligently filters composite operations to isolate "leaf" compute kernels, supports custom whitelisting/blacklisting of operators, and injects experiment metadata for easy dataset merging.

Notifications You must be signed in to change notification settings

HicrestLaboratory/PyTorch_Trace_Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyTorch Trace Parser & CSV Exporter

A specialized tool for analyzing PyTorch execution traces. This script parses raw .json trace files generated by the PyTorch Profiler, extracts the actual "leaf" operations, and exports them into a clean, analyzed CSV format.

🚀 Why use this?

This tool:

  1. Isolates Leaf Operations: Uses a stack-based algorithm to remove "parent" wrappers and keep only the operations that did the work.
  2. Filters Noise: Automatically removes operations with empty input shapes (e.g., scalars or empty lists).
  3. Custom Control: Allows you to Blacklist unwanted ops or Whitelist custom kernels (forcing them to be treated as leaves).
  4. Dataset Ready: Injects model metadata (Name, Task, Training status) into every row, making the CSVs mergeable for large-scale analysis.

⚡ How to Generate a Trace

To use this tool, you need to generate a .json trace file from your PyTorch model. You must verify that record_shapes=True is enabled so the tool can extract input dimensions.

Add the following context manager around your training step or inference call:

import torch
from torch.profiler import profile, record_function, ProfilerActivity

# ... Load your model and data ...

# 1. Start the Profiler
with profile(
    activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
    record_shapes=True,                 # REQUIRED: captures Input Dims
    with_stack=True                     # REQUIRED: captures hierarchy for leaf detection
) as prof:
    
    # 2. Run your model
    with record_function("model_inference"):
        output = model(inputs)

# 3. Export the Trace
prof.export_chrome_trace("trace_output.json")

Once generated, trace_output.json is ready to be parsed by this tool.

📋 Requirements

  • Python 3.6+
  • No external dependencies (uses standard libraries: json, argparse, csv, os, collections).

🛠️ Usage

Run the script from the command line passing the trace file and the mandatory metadata arguments.

python parse_trace.py <trace_file.json> \
  --model-name "ResNet50" \
  --ml-task "Image Classification" \
  --is-training "True"

Arguments

Argument Required Description
file_path Path to the input PyTorch trace .json file.
--model-name Name of the model (e.g., "BERT", "ResNet").
--ml-task The machine learning task (e.g., "NLP", "Detection").
--is-training True if trace is from training, False for inference.
--output-dir Directory to save the output CSV. Defaults to current directory.
--blacklist Path to a .txt file containing operations to exclude.
--whitelist Path to a .txt file containing operations to force include (opaque leaves).

Examples

1. Basic Usage

python parse_trace.py trace_resnet.json \
  --model-name "ResNet18" \
  --ml-task "Classification" \
  --is-training "False"

2. Using Filters (Blacklist & Whitelist) - Recommended

python parse_trace.py trace_custom.json \
  --model-name "MyCustomModel" \
  --ml-task "Recommendation" \
  --is-training "True" \
  --output-dir ./processed_data \
  --blacklist filters/ignore_ops.txt \
  --whitelist filters/custom_kernels.txt

🔍 How Filtering Works

1. Leaf Detection (Standard)

By default, if Op A calls Op B, the script discards Op A (the parent) and records Op B (the leaf). This ensures you measure the actual computation, not the Python wrapper overhead.

2. The Whitelist (Opaque Leaves)

If you have a custom operation (e.g., MyCustomLayer) that calls standard PyTorch ops internally, you might want to report the custom layer as a single unit rather than breaking it down.

  • Action: Add MyCustomLayer to your whitelist file.
  • Result: The script treats it as an "opaque leaf." It records MyCustomLayer and ignores any operations called strictly inside it.

3. The Blacklist

Any operation name found in the blacklist file is completely stripped from the final CSV output.


📂 Output Format

The script generates a file named <trace_filename>_ops.csv with the following columns:

Column Description
Model Name Metadata provided via arguments.
ML Task Metadata provided via arguments.
Is Training Metadata provided via arguments.
Operation The name of the executed operation (e.g., aten::add).
Input Dims The shape of input tensors (e.g., [[32, 1024], [1024, 1024]]).
Input Type The data types of inputs (e.g., ['float', 'float']).

Note: Operations with empty input shapes (e.g., [] or [[], []]) are automatically excluded to remove scalar operations.

About

A lightweight Python tool that converts PyTorch Profiler traces (.json) into clean, structured CSV datasets. It intelligently filters composite operations to isolate "leaf" compute kernels, supports custom whitelisting/blacklisting of operators, and injects experiment metadata for easy dataset merging.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages