-
Notifications
You must be signed in to change notification settings - Fork 2
Python Scripts
Documentation for all Python tools in the GraphBrew framework.
The scripts folder contains a modular library (lib/) and the main orchestration script:
scripts/
├── graphbrew_experiment.py # ⭐ MAIN: Orchestration script (~3100 lines)
├── requirements.txt # Python dependencies
│
├── lib/ # 📦 Modular library (~11000 lines total)
│ ├── __init__.py # Module exports
│ ├── types.py # Data classes (GraphInfo, BenchmarkResult, etc.)
│ ├── phases.py # Phase orchestration (run_reorder_phase, etc.)
│ ├── utils.py # Core utilities (ALGORITHMS, run_command, etc.)
│ ├── features.py # Graph feature computation & system utilities
│ ├── dependencies.py # System dependency detection & installation
│ ├── download.py # Graph downloading from SuiteSparse
│ ├── build.py # Binary compilation utilities
│ ├── reorder.py # Vertex reordering generation
│ ├── benchmark.py # Performance benchmark execution
│ ├── cache.py # Cache simulation analysis
│ ├── weights.py # Type-based weight management
│ ├── weight_merger.py # Cross-run weight consolidation
│ ├── training.py # ML weight training
│ ├── analysis.py # Adaptive order analysis
│ ├── progress.py # Progress tracking & reporting
│ └── results.py # Result file I/O
│
├── test/ # Test suite
│ ├── test_weight_flow.py # Weight generation/loading tests
│ ├── test_weight_merger.py # Merger consolidation tests
│ └── test_fill_adaptive.py # Fill-weights pipeline tests
│
├── weights/ # Type-based weight files
│ ├── active/ # C++ reads from here (working copy)
│ │ ├── type_registry.json # Maps graphs → types + centroids
│ │ ├── type_0.json # Cluster 0 weights
│ │ └── type_N.json # Additional clusters
│ ├── merged/ # Accumulated from all runs
│ └── runs/ # Historical snapshots
│
├── examples/ # Example scripts
│ ├── batch_process.py # Batch processing example
│ ├── compare_algorithms.py # Algorithm comparison example
│ ├── custom_pipeline.py # Custom phase-based pipeline example
│ └── quick_test.py # Quick testing example
└── requirements.txt # Python dependencies (optional)
The main script provides orchestration over the lib/ modules. It handles argument parsing and calls the appropriate phase functions.
# Full pipeline: download → build → experiment → weights
python3 scripts/graphbrew_experiment.py --full --download-size SMALL
# See all options
python3 scripts/graphbrew_experiment.py --help| Feature | Description |
|---|---|
| Graph Download | Downloads from SuiteSparse collection (87 graphs available) |
| Auto Build | Compiles binaries if missing |
| Memory Management | Automatically skips graphs exceeding RAM limits |
| Label Maps | Pre-generates reordering maps for consistency |
| Reordering | Tests all 18 algorithms |
| Benchmarks | PR, BFS, CC, SSSP, BC, TC |
| Cache Simulation | L1/L2/L3 hit rate analysis |
| Perceptron Training | Generates weights for AdaptiveOrder |
| Brute-Force Validation | Compares adaptive vs all algorithms |
| Option | Description |
|---|---|
--check-deps |
Check system dependencies (g++, boost, numa, etc.) |
--install-deps |
Install missing system dependencies (requires sudo) |
--install-boost |
Download, compile, and install Boost 1.58.0 to /opt/boost_1_58_0 |
| Option | Description |
|---|---|
--full |
Run complete pipeline (download → build → experiment → weights) |
--download-only |
Only download graphs |
--download-size |
SMALL (16), MEDIUM (28), LARGE (37), XLARGE (6), ALL (87 graphs) |
--clean |
Clean results (keep graphs/weights) |
--clean-all |
Full reset for fresh start |
| Option | Description |
|---|---|
--max-memory GB |
Maximum RAM (GB) for graph processing |
--auto-memory |
Automatically detect available RAM (uses 80% of total) |
| Option | Description |
|---|---|
--max-disk GB |
Maximum disk space (GB) for downloads |
--auto-disk |
Automatically limit downloads to available disk space |
| Option | Description |
|---|---|
--phase |
Run specific phase: all, reorder, benchmark, cache, weights, adaptive |
--graphs |
Graph size: all, small, medium, large, custom |
--key-only |
Only test key algorithms (faster) |
--skip-cache |
Skip cache simulations |
--brute-force |
Run brute-force validation |
| Option | Description |
|---|---|
--generate-maps |
Pre-generate .lo mapping files |
--use-maps |
Use pre-generated label maps |
| Option | Description |
|---|---|
--train-adaptive |
Run iterative training feedback loop |
--train-large |
Run large-scale batched training |
--target-accuracy |
Target accuracy % (default: 80) |
--fill-weights |
Fill ALL weight fields with comprehensive analysis |
# One-click full experiment
python3 scripts/graphbrew_experiment.py --full --download-size SMALL
# Quick test with key algorithms
python3 scripts/graphbrew_experiment.py --graphs small --key-only
# Pre-generate label maps
python3 scripts/graphbrew_experiment.py --generate-maps --graphs small
# Fill ALL weight fields
python3 scripts/graphbrew_experiment.py --fill-weights --graphs small --max-graphs 5
# Clean and start fresh
python3 scripts/graphbrew_experiment.py --clean-all --full --download-size SMALLThe lib/ folder contains modular, reusable components. Each module can be used independently or via the phase orchestration system.
Central type definitions used across all modules:
from scripts.lib.types import GraphInfo, BenchmarkResult, CacheResult, ReorderResult
# GraphInfo - Graph metadata
GraphInfo(name="web-Stanford", path="graphs/web-Stanford/web-Stanford.mtx",
size_mb=5.2, nodes=281903, edges=2312497)
# BenchmarkResult - Benchmark execution result
BenchmarkResult(graph="web-Stanford", algorithm_id=7, algorithm_name="HUBCLUSTERDBG",
benchmark="pr", avg_time=0.234, speedup=1.45, success=True)
# CacheResult - Cache simulation result
CacheResult(graph="web-Stanford", algorithm_id=7, algorithm_name="HUBCLUSTERDBG",
benchmark="pr", l1_miss_rate=0.12, l2_miss_rate=0.08, l3_miss_rate=0.02)
# ReorderResult - Reordering result
ReorderResult(graph="web-Stanford", algorithm_id=7, algorithm_name="HUBCLUSTERDBG",
time_seconds=1.23, mapping_file="mappings/web-Stanford/HUBCLUSTERDBG.lo")High-level phase functions for building custom pipelines:
from scripts.lib.phases import (
PhaseConfig,
run_reorder_phase,
run_benchmark_phase,
run_cache_phase,
run_weights_phase,
run_full_pipeline,
)
# Create configuration
config = PhaseConfig(
benchmarks=['pr', 'bfs', 'cc'],
trials=3,
skip_slow=True
)
# Run individual phases
reorder_results, label_maps = run_reorder_phase(graphs, algorithms, config)
benchmark_results = run_benchmark_phase(graphs, algorithms, label_maps, config)
# Or run full pipeline
results = run_full_pipeline(graphs, algorithms, config, phases=['reorder', 'benchmark'])Constants and shared utilities:
from scripts.lib.utils import (
ALGORITHMS, # {0: "ORIGINAL", 1: "RANDOM", ...}
BENCHMARKS, # ['pr', 'bfs', 'cc', 'sssp', 'bc', 'tc']
run_command, # Execute shell commands
get_timestamp, # Formatted timestamps
)Graph feature computation and system utilities:
from scripts.lib.features import (
# Graph type detection
detect_graph_type,
compute_extended_features,
# System utilities
get_available_memory_gb,
get_num_threads,
estimate_graph_memory_gb,
)
# Compute graph features
features = compute_extended_features("graph.mtx")
# Returns: {modularity, density, avg_degree, degree_variance, clustering_coefficient, ...}
# Detect graph type
graph_type = detect_graph_type(features) # "social", "web", "road", etc.Automatic system dependency detection and installation:
from scripts.lib.dependencies import (
check_dependencies, # Check all required dependencies
install_dependencies, # Install missing dependencies (needs sudo)
install_boost_158, # Download and compile Boost 1.58.0
check_boost_158, # Check if Boost 1.58.0 is installed
detect_platform, # Detect OS and package manager
get_package_manager, # Get system package manager commands
)
# Check dependencies
status = check_dependencies()
# Returns dict with: g++, boost, numa, tcmalloc, python versions and status
# Install missing dependencies (requires sudo)
install_dependencies()
# Install Boost 1.58.0 for RabbitOrder
install_boost_158() # Downloads, compiles with bootstrap/b2, installs to /opt/boost_1_58_0
# Check Boost 1.58 specifically
version = check_boost_158() # Returns version string or NoneDownload graphs from SuiteSparse:
from scripts.lib.download import (
DOWNLOAD_GRAPHS_SMALL, # 16 small graphs
DOWNLOAD_GRAPHS_MEDIUM, # 28 medium graphs
download_graphs,
get_catalog_stats,
)
# Download small graphs
download_graphs(DOWNLOAD_GRAPHS_SMALL, output_dir="./graphs")
# Get catalog statistics
stats = get_catalog_stats()
print(f"Total graphs: {stats['total']}, Total size: {stats['total_size_gb']:.1f} GB")Generate vertex reorderings:
from scripts.lib.reorder import (
generate_reorderings,
generate_reorderings_with_variants,
load_label_maps_index,
)
# Generate reorderings for all algorithms
results = generate_reorderings(graphs, algorithms, bin_dir="bench/bin")
# Load existing label maps
label_maps = load_label_maps_index("results")Run performance benchmarks:
from scripts.lib.benchmark import (
run_benchmark,
run_benchmark_suite,
parse_benchmark_output,
)
# Run single benchmark
result = run_benchmark(graph_path, algorithm_id, benchmark="pr", bin_dir="bench/bin")
# Run full suite
results = run_benchmark_suite(graphs, algorithms, benchmarks=['pr', 'bfs'])Run cache simulations:
from scripts.lib.cache import (
run_cache_simulations,
get_cache_stats_summary,
)
# Run simulations
results = run_cache_simulations(graphs, algorithms, benchmarks=['pr'])
# Get summary statistics
summary = get_cache_stats_summary(results)Type-based weight management for AdaptiveOrder:
from scripts.lib.weights import (
assign_graph_type,
update_type_weights_incremental,
get_best_algorithm_for_type,
load_type_registry,
)
# Assign graph to a type based on features
type_name, is_new = assign_graph_type("web-Stanford", features)
# Update weights incrementally
update_type_weights_incremental(type_name, algorithm_name, benchmark, speedup)
# Get best algorithm for a type
best_algo = get_best_algorithm_for_type(type_name, benchmark="pr")Train adaptive weights:
from scripts.lib.training import (
train_adaptive_weights_iterative,
train_adaptive_weights_large_scale,
)
# Iterative training
result = train_adaptive_weights_iterative(
graphs=graphs,
bin_dir="bench/bin",
target_accuracy=0.85,
max_iterations=10
)
print(f"Final accuracy: {result.final_accuracy:.2%}")Analyze adaptive ordering:
from scripts.lib.analysis import (
analyze_adaptive_order,
compare_adaptive_vs_fixed,
run_subcommunity_brute_force,
)
# Analyze adaptive ordering
results = analyze_adaptive_order(graphs, bin_dir="bench/bin")
# Compare adaptive vs fixed algorithms
comparison = compare_adaptive_vs_fixed(graphs, fixed_algorithms=[7, 15, 16])Visual progress tracking:
from scripts.lib.progress import ProgressTracker
progress = ProgressTracker()
progress.banner("EXPERIMENT", "Running GraphBrew benchmarks")
progress.phase_start("REORDERING", "Generating vertex reorderings")
progress.info("Processing graph: web-Stanford")
progress.success("Completed 10/15 graphs")
progress.phase_end("Reordering complete")Create custom experiment pipelines using lib/phases.py:
#!/usr/bin/env python3
"""Custom GraphBrew pipeline example."""
import sys
sys.path.insert(0, "scripts")
from lib.phases import PhaseConfig, run_reorder_phase, run_benchmark_phase
from lib.types import GraphInfo
from lib.progress import ProgressTracker
# Discover graphs
graphs = [
GraphInfo(name="web-Stanford", path="graphs/web-Stanford/web-Stanford.mtx",
size_mb=5.2, nodes=281903, edges=2312497)
]
# Select algorithms
algorithms = [0, 7, 15, 16] # ORIGINAL, HUBCLUSTERDBG, LeidenOrder, LeidenDendrogram
# Create configuration
config = PhaseConfig(
benchmarks=['pr', 'bfs'],
trials=3,
progress=ProgressTracker()
)
# Run phases
reorder_results, label_maps = run_reorder_phase(graphs, algorithms, config)
benchmark_results = run_benchmark_phase(graphs, algorithms, label_maps, config)
# Print results
for r in benchmark_results:
if r.success:
print(f"{r.graph} / {r.algorithm_name} / {r.benchmark}: {r.avg_time:.4f}s")See scripts/examples/custom_pipeline.py for a complete example.
results/
├── mappings/ # Pre-generated label mappings
│ ├── index.json # Mapping index
│ └── {graph_name}/ # Per-graph mappings
│ ├── HUBCLUSTERDBG.lo # Label order file
│ └── HUBCLUSTERDBG.time # Reorder timing
├── reorder_*.json # Reorder results
├── benchmark_*.json # Benchmark results
├── cache_*.json # Cache simulation results
└── logs/ # Execution logs
scripts/weights/ # Type-based weights
├── active/ # C++ reads from here
│ ├── type_registry.json # Graph → type mapping
│ ├── type_0.json # Cluster 0 weights
│ └── type_N.json # Additional clusters
├── merged/ # Accumulated from all runs
└── runs/ # Historical snapshots
cd scripts
pip install -r requirements.txt# Core dependencies - NONE REQUIRED
# All benchmark scripts use only Python 3.8+ standard library
# Optional: For extended analysis and visualization (uncomment if needed)
# numpy>=1.20.0 # For statistical analysis
# pandas>=1.3.0 # For data manipulation
# matplotlib>=3.4.0 # For plotting results
# scipy>=1.7.0 # For correlation analysis
# networkx>=2.6 # For graph analysis
pip install -r scripts/requirements.txt
python3 --version # Should be 3.8+make all
make sim # For cache simulationchmod +x bench/bin/*
chmod +x bench/bin_sim/*- AdaptiveOrder-ML - ML perceptron details
- Running-Benchmarks - Command-line usage
- Code-Architecture - Codebase structure