PyTorch Trace Parser & CSV Exporter

A specialized tool for analyzing PyTorch execution traces. This script parses raw .json trace files generated by the PyTorch Profiler, extracts the actual "leaf" operations, and exports them into a clean, analyzed CSV format.

🚀 Why use this?

This tool:

Isolates Leaf Operations: Uses a stack-based algorithm to remove "parent" wrappers and keep only the operations that did the work.
Filters Noise: Automatically removes operations with empty input shapes (e.g., scalars or empty lists).
Custom Control: Allows you to Blacklist unwanted ops or Whitelist custom kernels (forcing them to be treated as leaves).
Dataset Ready: Injects model metadata (Name, Task, Training status) into every row, making the CSVs mergeable for large-scale analysis.

⚡ How to Generate a Trace

To use this tool, you need to generate a .json trace file from your PyTorch model. You must verify that record_shapes=True is enabled so the tool can extract input dimensions.

Add the following context manager around your training step or inference call:

import torch
from torch.profiler import profile, record_function, ProfilerActivity

# ... Load your model and data ...

# 1. Start the Profiler
with profile(
    activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
    record_shapes=True,                 # REQUIRED: captures Input Dims
    with_stack=True                     # REQUIRED: captures hierarchy for leaf detection
) as prof:
    
    # 2. Run your model
    with record_function("model_inference"):
        output = model(inputs)

# 3. Export the Trace
prof.export_chrome_trace("trace_output.json")

Once generated, trace_output.json is ready to be parsed by this tool.

📋 Requirements

Python 3.6+
No external dependencies (uses standard libraries: json, argparse, csv, os, collections).

🛠️ Usage

Run the script from the command line passing the trace file and the mandatory metadata arguments.

python parse_trace.py <trace_file.json> \
  --model-name "ResNet50" \
  --ml-task "Image Classification" \
  --is-training "True"

Arguments

Argument	Required	Description
`file_path`	✅	Path to the input PyTorch trace `.json` file.
`--model-name`	✅	Name of the model (e.g., "BERT", "ResNet").
`--ml-task`	✅	The machine learning task (e.g., "NLP", "Detection").
`--is-training`	✅	`True` if trace is from training, `False` for inference.
`--output-dir`	❌	Directory to save the output CSV. Defaults to current directory.
`--blacklist`	❌	Path to a `.txt` file containing operations to exclude.
`--whitelist`	❌	Path to a `.txt` file containing operations to force include (opaque leaves).

Examples

1. Basic Usage

python parse_trace.py trace_resnet.json \
  --model-name "ResNet18" \
  --ml-task "Classification" \
  --is-training "False"

2. Using Filters (Blacklist & Whitelist) - Recommended

python parse_trace.py trace_custom.json \
  --model-name "MyCustomModel" \
  --ml-task "Recommendation" \
  --is-training "True" \
  --output-dir ./processed_data \
  --blacklist filters/ignore_ops.txt \
  --whitelist filters/custom_kernels.txt

🔍 How Filtering Works

1. Leaf Detection (Standard)

By default, if Op A calls Op B, the script discards Op A (the parent) and records Op B (the leaf). This ensures you measure the actual computation, not the Python wrapper overhead.

2. The Whitelist (Opaque Leaves)

If you have a custom operation (e.g., MyCustomLayer) that calls standard PyTorch ops internally, you might want to report the custom layer as a single unit rather than breaking it down.

Action: Add MyCustomLayer to your whitelist file.
Result: The script treats it as an "opaque leaf." It records MyCustomLayer and ignores any operations called strictly inside it.

3. The Blacklist

Any operation name found in the blacklist file is completely stripped from the final CSV output.

📂 Output Format

The script generates a file named <trace_filename>_ops.csv with the following columns:

Column	Description
Model Name	Metadata provided via arguments.
ML Task	Metadata provided via arguments.
Is Training	Metadata provided via arguments.
Operation	The name of the executed operation (e.g., `aten::add`).
Input Dims	The shape of input tensors (e.g., `[[32, 1024], [1024, 1024]]`).
Input Type	The data types of inputs (e.g., `['float', 'float']`).

Note: Operations with empty input shapes (e.g., [] or [[], []]) are automatically excluded to remove scalar operations.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
datasets		datasets
.gitignore		.gitignore
README.md		README.md
blacklist.txt		blacklist.txt
tracer_parser.py		tracer_parser.py
whitelist.txt		whitelist.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyTorch Trace Parser & CSV Exporter

🚀 Why use this?

⚡ How to Generate a Trace

📋 Requirements

🛠️ Usage

Arguments

Examples

🔍 How Filtering Works

1. Leaf Detection (Standard)

2. The Whitelist (Opaque Leaves)

3. The Blacklist

📂 Output Format

About

Uh oh!

Releases

Packages

Languages

HicrestLaboratory/PyTorch_Trace_Parser

Folders and files

Latest commit

History

Repository files navigation

PyTorch Trace Parser & CSV Exporter

🚀 Why use this?

⚡ How to Generate a Trace

📋 Requirements

🛠️ Usage

Arguments

Examples

🔍 How Filtering Works

1. Leaf Detection (Standard)

2. The Whitelist (Opaque Leaves)

3. The Blacklist

📂 Output Format

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages