-
Notifications
You must be signed in to change notification settings - Fork 7
Add a visualization tool for DeepXTrace heatmap generation #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| # DeepXTrace Heatmap Visualization Tool | ||
|
|
||
| ## Overview | ||
|
|
||
| A Python tool for generating professional heatmap visualizations from matrix data in log files. The tool transforms numerical matrices into color-coded heatmaps with optimized Red-Yellow-Green colormaps, designed specifically for analyzing `DeepXTrace` performance data. | ||
|
|
||
| ## Features | ||
|
|
||
| - **Automatic Data Parsing**: Extracts matrix data from log files with bracketed number sequences | ||
| - **Customizable Heatmaps**: | ||
| - Adjustable cell sizes and figure dimensions | ||
| - Log-scale value transformation | ||
| - Optimized color gradients | ||
| - **Multiple Output Formats**: PNG, SVG, and PDF support | ||
| - **Command Line Interface**: Easy parameter configuration | ||
|
|
||
| ## Installation | ||
|
|
||
| 1. Install required packages: | ||
| ```bash | ||
| pip install numpy matplotlib seaborn | ||
| ``` | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Basic Command | ||
| ```bash | ||
| python deepxtrace_heatmap.py diagnose_dispatch.log | ||
| ``` | ||
|
|
||
| ### Advanced Options | ||
| ```bash | ||
| python deepxtrace_heatmap.py diagnose_dispatch.log \ | ||
| --title "DeepXTrace Heatmap" \ | ||
| --format png \ | ||
| --dpi 100 \ | ||
| --cell_ratio 1.5 \ | ||
| --figsize 15 5 | ||
| ``` | ||
|
|
||
| ### Parameters | ||
| | Option | Description | Default | | ||
| |--------|-------------|---------| | ||
| | `input_file` | Path to log file containing matrix data | (required) | | ||
| | `--title` | Chart title | "DeepXTrace Heatmap" | | ||
| | `--figsize` | Base figure dimensions (width height) | 15 5 | | ||
| | `--dpi` | Output resolution | 100 | | ||
| | `--format` | Output format (png/svg/pdf) | png | | ||
| | `--cell_ratio` | Cell size scaling factor | 1.5 | | ||
|
|
||
| ## Input Format Requirements | ||
|
|
||
| The tool expects log files containing `Dispatch` or `Combine` matrix data in the format: | ||
| ``` | ||
| [2025-09-15T16:28:53.686] rank=0 [41531961 110335835 86107580 66367570 7394516487 98864659 77445251 83158442 661173744 770990417 970651331 1169696858 1110225921 1032652403 783432874 1155882777] | ||
| [2025-09-15T16:28:53.686] rank=1 [238740255 44996389 59882080 55664704 7165950075 76972836 46228924 65718587 571526423 492956401 801280801 1005671232 890904524 856512359 645929682 1019591766] | ||
| [2025-09-15T16:28:53.686] rank=2 [334435436 167234535 48214685 64726500 7409136415 90471583 91946181 126393132 891398915 1039890608 505038755 732687145 861932621 800339974 758561470 1188180120] | ||
| [2025-09-15T16:28:53.686] rank=3 [333948126 193223167 76447652 42300825 7471848921 101749092 104491778 129574131 889977106 1112178809 635445546 802803027 1094816878 990583677 724634976 1277006859] | ||
| [2025-09-15T16:28:53.686] rank=4 [100322219 32604154 33821149 38549804 32906157 29526787 38683107 37435256 435736846 480513292 426603806 524358342 292624239 278059628 402173854 594915314] | ||
| [2025-09-15T16:28:53.686] rank=5 [357043256 181453132 80758999 85291294 7421443739 47244226 101571583 139173328 1117259847 1145205954 1057410524 1054152722 775299941 739947553 795537480 1331416100] | ||
| [2025-09-15T16:28:53.686] rank=6 [286975496 143404539 73738186 74395532 7345611673 80474587 50972258 90921435 983307617 1120489524 959253168 1021143468 1082499244 1037776426 500807147 731176052] | ||
| [2025-09-15T16:28:53.686] rank=7 [272805187 133017886 81167908 69928315 7385227225 91787438 51592459 46642463 1131431597 1215421329 1025774657 1161148723 1198908837 1111222567 567439642 820164593] | ||
| [2025-09-15T16:28:53.687] rank=8 [1799244931 1771427418 2212721381 2090795111 9843399347 1965307726 2060629937 2074112772 39992397 266559760 139867889 97612777 118900793 116220731 91414986 238706645] | ||
| [2025-09-15T16:28:53.687] rank=9 [1655414065 1563369384 2004001325 2046794620 9466103677 1794375577 1984338541 1878764904 49502256 38868860 59274616 87844533 58301901 61592595 51137090 106170129] | ||
| [2025-09-15T16:28:53.687] rank=10 [2336093851 2339507252 1636766194 1522743922 9353982466 1957680714 1988800417 2103018849 51456460 185387265 32788842 84941066 76972510 61175827 48398104 118321085] | ||
| [2025-09-15T16:28:53.687] rank=11 [2156833560 1995330881 1362533786 1335285470 9443024486 1685807693 1963716577 1805817033 52988364 262352327 152314429 38867029 103545888 107679854 90689823 200138055] | ||
| [2025-09-15T16:28:53.687] rank=12 [2368770810 2292785024 2096036713 2102136846 9080616395 1264586861 1966938351 2086225412 51896783 204759212 84471768 81382103 44806234 67152460 53543548 147589323] | ||
| [2025-09-15T16:28:53.687] rank=13 [2164995975 2264660786 1982838357 1770655686 9150062570 1295758258 2390980304 1981308765 55147228 220554850 91599022 80561623 77043566 34885826 54259055 145230411] | ||
| [2025-09-15T16:28:53.687] rank=14 [2456859557 2510109469 2142689581 2142342775 9878097124 2081236339 1819146021 1604522354 94422658 282214121 146647879 149147296 133236986 122706675 42262969 192291189] | ||
| [2025-09-15T16:28:53.687] rank=15 [2132370530 2132035034 1867821603 1745528175 9137695724 1703905141 1214731393 1334805808 32792273 155791509 72087392 66934753 47560512 72614190 55974055 47041973] | ||
| ``` | ||
|
|
||
| ## Output Example | ||
|
|
||
|  |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,181 @@ | ||||||
| import re | ||||||
| import numpy as np | ||||||
| import matplotlib.pyplot as plt | ||||||
| import seaborn as sns | ||||||
| import argparse | ||||||
| from matplotlib.colors import LinearSegmentedColormap | ||||||
| from matplotlib.ticker import ScalarFormatter | ||||||
| from datetime import datetime | ||||||
|
|
||||||
|
|
||||||
| def create_optimized_ryg_cmap(): | ||||||
| """Create an optimized Red-Yellow-Green colormap (6-color scale)""" | ||||||
| colors = [ | ||||||
| (0.0, "#63BE7B"), # Green | ||||||
| (0.3, "#A8D08D"), # Green to Yellow-Green | ||||||
| (0.5, "#FFEB84"), # Center Yellow | ||||||
| (0.7, "#F4B084"), # Yellow to Orange | ||||||
| (0.9, "#FF7C7C"), # Light Red | ||||||
| (1.0, "#C00000") # Red | ||||||
| ] | ||||||
| return LinearSegmentedColormap.from_list("optimized_ryg", colors) | ||||||
|
|
||||||
|
|
||||||
| def parse_matrix_data(log_data): | ||||||
| """Parse matrix data from log string containing bracketed number sequences""" | ||||||
| number_sequences = re.findall(r'\[([\d\s]+)\]', log_data) | ||||||
| return np.array([list(map(int, seq.split())) for seq in number_sequences]) | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The list comprehension for parsing numbers can be made more robust. If a log line contains brackets with only whitespace (e.g.,
Suggested change
|
||||||
|
|
||||||
|
|
||||||
| def read_log_file(file_path): | ||||||
| """Read content from specified log file""" | ||||||
| try: | ||||||
| with open(file_path, 'r', encoding='utf-8') as f: | ||||||
| return f.read() | ||||||
| except FileNotFoundError: | ||||||
| raise SystemExit(f"Error: File not found - {file_path}") | ||||||
| except Exception as e: | ||||||
| raise SystemExit(f"Error reading file: {str(e)}") | ||||||
|
|
||||||
|
|
||||||
| def plot_deepxtrace_heatmap( | ||||||
| matrix, | ||||||
| title="DeepXTrace Heatmap", | ||||||
| figsize=( | ||||||
| 15, | ||||||
| 5), | ||||||
| dpi=100, | ||||||
| output_format='png', | ||||||
| cell_ratio=1.5): | ||||||
| """ | ||||||
| Generate a deepxtrace heatmap | ||||||
|
|
||||||
| Args: | ||||||
| matrix: Input 2D numpy array | ||||||
| title: Chart title (default: "DeepXTrace Heatmap") | ||||||
| figsize: Base figure size (will be scaled by cell_ratio) | ||||||
| dpi: Output resolution in dots per inch (default: 100) | ||||||
| output_format: Output file format (default: 'png') | ||||||
| cell_ratio: Cell size scaling factor (default: 1.5) | ||||||
| """ | ||||||
| # Calculate adjusted figure size based on matrix dimensions | ||||||
| rows, cols = matrix.shape | ||||||
| adjusted_figsize = (figsize[0] * cell_ratio * (cols / 10), | ||||||
|
Comment on lines
+62
to
+63
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion (performance): Consider clamping the auto-scaled figure size to avoid excessively large outputs for big matrices. Because |
||||||
| figsize[1] * cell_ratio * (rows / 10)) | ||||||
|
|
||||||
| # Configure vector output settings | ||||||
| plt.figure(figsize=adjusted_figsize) | ||||||
| plt.rcParams.update({ | ||||||
| 'svg.fonttype': 'none', | ||||||
| 'pdf.fonttype': 42 | ||||||
| }) | ||||||
|
|
||||||
| # Create colormap and normalize data | ||||||
| cmap = create_optimized_ryg_cmap() | ||||||
| log_matrix = np.log1p(matrix) | ||||||
| norm = plt.Normalize(vmin=log_matrix.min(), vmax=log_matrix.max()) | ||||||
|
|
||||||
| # Dynamic annotation size based on cell size | ||||||
| annot_size = max(8, min(20, 10 * cell_ratio)) | ||||||
|
|
||||||
| # Generate heatmap | ||||||
| ax = sns.heatmap( | ||||||
| log_matrix, | ||||||
| cmap=cmap, | ||||||
| norm=norm, | ||||||
| annot=matrix, | ||||||
| fmt='.2e', | ||||||
| linewidths=0.5, | ||||||
| linecolor="white", | ||||||
| annot_kws={ | ||||||
|
Comment on lines
+82
to
+90
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion (performance): Automatically disabling annotations for large matrices would improve performance and readability.
Suggested implementation: # Create colormap and normalize data
cmap = create_optimized_ryg_cmap()
log_matrix = np.log1p(matrix)
norm = plt.Normalize(vmin=log_matrix.min(), vmax=log_matrix.max())
# Dynamic annotation size based on cell size
annot_size = max(8, min(20, 10 * cell_ratio))
# Automatically disable annotations for large matrices (performance & readability)
# Threshold can be tuned; 400 cells (~20x20) keeps small traces detailed and larger ones clean.
show_annotations = matrix.size <= 400
if show_annotations:
annot_data = matrix
annot_kws = {
"size": annot_size,
"color": "black",
}
else:
annot_data = None
annot_kws = None
# Generate heatmap
ax = sns.heatmap(
log_matrix,
cmap=cmap,
norm=norm,
annot=annot_data,
fmt='.2e',
linewidths=0.5,
linecolor="white",
annot_kws=annot_kws,
cbar_kws={
"label": "Log(Value + 1) Scale",
"format": ScalarFormatter(),
"shrink": 0.8
}
)
# Configure labels and title
ax.set_title(title, fontsize=16 * cell_ratio, pad=20, fontweight='bold')If you want this behavior to be configurable (e.g., via a parameter or CLI flag), you can:
|
||||||
| "size": annot_size, | ||||||
| "color": "black" | ||||||
| }, | ||||||
| cbar_kws={ | ||||||
| "label": "Log(Value + 1) Scale", | ||||||
| "format": ScalarFormatter(), | ||||||
| "shrink": 0.8 | ||||||
| } | ||||||
| ) | ||||||
|
|
||||||
| # Configure labels and title | ||||||
| ax.set_title(title, fontsize=16 * cell_ratio, pad=20, fontweight='bold') | ||||||
| plt.xticks(fontsize=10 * cell_ratio, rotation=45) | ||||||
| plt.yticks(fontsize=10 * cell_ratio) | ||||||
|
|
||||||
| # Colorbar customization | ||||||
| cbar = ax.collections[0].colorbar | ||||||
| cbar.ax.tick_params(labelsize=10 * cell_ratio) | ||||||
| cbar.ax.set_ylabel( | ||||||
| "Color Scale (Token Wait Time)", | ||||||
| fontsize=12 * cell_ratio, | ||||||
| fontweight='bold' | ||||||
| ) | ||||||
|
|
||||||
| # Save output | ||||||
| output_file = f"deepxtrace.{output_format}" | ||||||
wangfakang marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| print( | ||||||
|
Comment on lines
+116
to
+117
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion (bug_risk): Derive the output filename from the input file (or allow configuration) instead of always using a fixed name. Using a fixed Suggested implementation: from datetime import datetime
from pathlib import Path # Save output
# Derive output filename from input file to avoid overwriting and improve traceability
input_path = Path(input_file) # assumes input_file is available in this scope
timestamp = datetime.now().strftime('%Y%m%d-%H%M%S')
output_file = f"{input_path.stem}.deepxtrace.{timestamp}.{output_format}"
print(
f"Saving to {output_file} started at {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}")
|
||||||
| f"Saving started at {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}") | ||||||
| plt.savefig( | ||||||
| output_file, | ||||||
| format=output_format, | ||||||
| bbox_inches='tight', | ||||||
| dpi=dpi) | ||||||
| print( | ||||||
| f"Saving completed at {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}") | ||||||
| plt.close() | ||||||
|
|
||||||
|
|
||||||
| def main(): | ||||||
| """Command line interface for DeepXTrace heatmap""" | ||||||
| parser = argparse.ArgumentParser( | ||||||
| description='Generate DeepXTrace heatmap visualization from log file', | ||||||
| formatter_class=argparse.ArgumentDefaultsHelpFormatter) | ||||||
|
|
||||||
| parser.add_argument( | ||||||
| 'input_file', | ||||||
| help='Path to log file containing matrix data') | ||||||
| parser.add_argument( | ||||||
| '--title', | ||||||
| default='DeepXTrace Heatmap', | ||||||
| help='Chart title') | ||||||
| parser.add_argument('--figsize', nargs=2, type=float, default=[15, 5], | ||||||
| help='Base figure dimensions (width height)') | ||||||
| parser.add_argument( | ||||||
| '--dpi', | ||||||
| type=int, | ||||||
| default=100, | ||||||
| help='Output resolution') | ||||||
| parser.add_argument( | ||||||
| '--format', | ||||||
| default='png', | ||||||
| choices=[ | ||||||
| 'png', | ||||||
| 'svg', | ||||||
| 'pdf'], | ||||||
| help='Output file format') | ||||||
| parser.add_argument('--cell_ratio', type=float, default=1.5, | ||||||
| help='Cell size scaling factor') | ||||||
|
|
||||||
| args = parser.parse_args() | ||||||
|
|
||||||
| # Read and process data | ||||||
| log_content = read_log_file(args.input_file) | ||||||
| data_matrix = parse_matrix_data(log_content) | ||||||
wangfakang marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| if data_matrix.size == 0: | ||||||
| raise SystemExit( | ||||||
| f"Error: No matrix data found in '{args.input_file}'.") | ||||||
|
|
||||||
| # Generate visualization | ||||||
| plot_deepxtrace_heatmap( | ||||||
| matrix=data_matrix, | ||||||
| title=args.title, | ||||||
| figsize=args.figsize, | ||||||
| dpi=args.dpi, | ||||||
| output_format=args.format, | ||||||
| cell_ratio=args.cell_ratio | ||||||
| ) | ||||||
|
|
||||||
|
|
||||||
| if __name__ == "__main__": | ||||||
| main() | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Handle empty or irregular matrix sequences more defensively before constructing the numpy array.
re.findallcan return an empty list, and inconsistent row lengths will produce a ragged array. That can lead to unexpected shapes or downstream errors when accessingmatrix.shape. Consider validatingnumber_sequences(non-empty, consistent lengths) and raising a clear error on malformed input instead of relying on later failures.