Skip to content

Python Scripts

Abdullah edited this page Jan 25, 2026 · 15 revisions

Python Scripts Guide

Documentation for all Python tools in the GraphBrew framework.

Overview

The scripts folder contains a modular library (lib/) and the main orchestration script:

scripts/
├── graphbrew_experiment.py      # ⭐ MAIN: Orchestration script (~3100 lines)
├── requirements.txt             # Python dependencies
│
├── lib/                         # 📦 Modular library (~11000 lines total)
│   ├── __init__.py              # Module exports
│   ├── types.py                 # Data classes (GraphInfo, BenchmarkResult, etc.)
│   ├── phases.py                # Phase orchestration (run_reorder_phase, etc.)
│   ├── utils.py                 # Core utilities (ALGORITHMS, run_command, etc.)
│   ├── features.py              # Graph feature computation & system utilities
│   ├── dependencies.py          # System dependency detection & installation
│   ├── download.py              # Graph downloading from SuiteSparse
│   ├── build.py                 # Binary compilation utilities
│   ├── reorder.py               # Vertex reordering generation
│   ├── benchmark.py             # Performance benchmark execution
│   ├── cache.py                 # Cache simulation analysis
│   ├── weights.py               # Type-based weight management
│   ├── weight_merger.py         # Cross-run weight consolidation
│   ├── training.py              # ML weight training
│   ├── analysis.py              # Adaptive order analysis
│   ├── progress.py              # Progress tracking & reporting
│   └── results.py               # Result file I/O
│
├── test/                        # Test suite
│   ├── test_weight_flow.py      # Weight generation/loading tests
│   ├── test_weight_merger.py    # Merger consolidation tests
│   └── test_fill_adaptive.py    # Fill-weights pipeline tests
│
├── weights/                     # Type-based weight files
│   ├── active/                  # C++ reads from here (working copy)
│   │   ├── type_registry.json   # Maps graphs → types + centroids
│   │   ├── type_0.json          # Cluster 0 weights
│   │   └── type_N.json          # Additional clusters
│   ├── merged/                  # Accumulated from all runs
│   └── runs/                    # Historical snapshots
│
├── examples/                    # Example scripts
│   ├── batch_process.py         # Batch processing example
│   ├── compare_algorithms.py    # Algorithm comparison example
│   ├── custom_pipeline.py       # Custom phase-based pipeline example
│   └── quick_test.py            # Quick testing example
└── requirements.txt             # Python dependencies (optional)

⭐ graphbrew_experiment.py - Main Orchestration

The main script provides orchestration over the lib/ modules. It handles argument parsing and calls the appropriate phase functions.

Quick Start

# Full pipeline: download → build → experiment → weights
python3 scripts/graphbrew_experiment.py --full --download-size SMALL

# See all options
python3 scripts/graphbrew_experiment.py --help

Key Features

Feature Description
Graph Download Downloads from SuiteSparse collection (87 graphs available)
Auto Build Compiles binaries if missing
Memory Management Automatically skips graphs exceeding RAM limits
Label Maps Pre-generates reordering maps for consistency
Reordering Tests all 18 algorithms
Benchmarks PR, BFS, CC, SSSP, BC, TC
Cache Simulation L1/L2/L3 hit rate analysis
Perceptron Training Generates weights for AdaptiveOrder
Brute-Force Validation Compares adaptive vs all algorithms

Command-Line Options

Dependency Management

Option Description
--check-deps Check system dependencies (g++, boost, numa, etc.)
--install-deps Install missing system dependencies (requires sudo)
--install-boost Download, compile, and install Boost 1.58.0 to /opt/boost_1_58_0

Pipeline Control

Option Description
--full Run complete pipeline (download → build → experiment → weights)
--download-only Only download graphs
--download-size SMALL (16), MEDIUM (28), LARGE (37), XLARGE (6), ALL (87 graphs)
--clean Clean results (keep graphs/weights)
--clean-all Full reset for fresh start

Memory Management

Option Description
--max-memory GB Maximum RAM (GB) for graph processing
--auto-memory Automatically detect available RAM (uses 80% of total)

Disk Space Management

Option Description
--max-disk GB Maximum disk space (GB) for downloads
--auto-disk Automatically limit downloads to available disk space

Experiment Options

Option Description
--phase Run specific phase: all, reorder, benchmark, cache, weights, adaptive
--graphs Graph size: all, small, medium, large, custom
--key-only Only test key algorithms (faster)
--skip-cache Skip cache simulations
--brute-force Run brute-force validation

Label Mapping (Consistent Reordering)

Option Description
--generate-maps Pre-generate .lo mapping files
--use-maps Use pre-generated label maps

Training Options

Option Description
--train-adaptive Run iterative training feedback loop
--train-large Run large-scale batched training
--target-accuracy Target accuracy % (default: 80)
--fill-weights Fill ALL weight fields with comprehensive analysis

Examples

# One-click full experiment
python3 scripts/graphbrew_experiment.py --full --download-size SMALL

# Quick test with key algorithms
python3 scripts/graphbrew_experiment.py --graphs small --key-only

# Pre-generate label maps
python3 scripts/graphbrew_experiment.py --generate-maps --graphs small

# Fill ALL weight fields
python3 scripts/graphbrew_experiment.py --fill-weights --graphs small --max-graphs 5

# Clean and start fresh
python3 scripts/graphbrew_experiment.py --clean-all --full --download-size SMALL

📦 lib/ Module Reference

The lib/ folder contains modular, reusable components. Each module can be used independently or via the phase orchestration system.

lib/types.py - Data Classes

Central type definitions used across all modules:

from scripts.lib.types import GraphInfo, BenchmarkResult, CacheResult, ReorderResult

# GraphInfo - Graph metadata
GraphInfo(name="web-Stanford", path="graphs/web-Stanford/web-Stanford.mtx", 
          size_mb=5.2, nodes=281903, edges=2312497)

# BenchmarkResult - Benchmark execution result
BenchmarkResult(graph="web-Stanford", algorithm_id=7, algorithm_name="HUBCLUSTERDBG",
                benchmark="pr", avg_time=0.234, speedup=1.45, success=True)

# CacheResult - Cache simulation result
CacheResult(graph="web-Stanford", algorithm_id=7, algorithm_name="HUBCLUSTERDBG",
            benchmark="pr", l1_miss_rate=0.12, l2_miss_rate=0.08, l3_miss_rate=0.02)

# ReorderResult - Reordering result
ReorderResult(graph="web-Stanford", algorithm_id=7, algorithm_name="HUBCLUSTERDBG",
              time_seconds=1.23, mapping_file="mappings/web-Stanford/HUBCLUSTERDBG.lo")

lib/phases.py - Phase Orchestration

High-level phase functions for building custom pipelines:

from scripts.lib.phases import (
    PhaseConfig,
    run_reorder_phase,
    run_benchmark_phase,
    run_cache_phase,
    run_weights_phase,
    run_full_pipeline,
)

# Create configuration
config = PhaseConfig(
    benchmarks=['pr', 'bfs', 'cc'],
    trials=3,
    skip_slow=True
)

# Run individual phases
reorder_results, label_maps = run_reorder_phase(graphs, algorithms, config)
benchmark_results = run_benchmark_phase(graphs, algorithms, label_maps, config)

# Or run full pipeline
results = run_full_pipeline(graphs, algorithms, config, phases=['reorder', 'benchmark'])

lib/utils.py - Core Utilities

Constants and shared utilities:

from scripts.lib.utils import (
    ALGORITHMS,          # {0: "ORIGINAL", 1: "RANDOM", ...}
    BENCHMARKS,          # ['pr', 'bfs', 'cc', 'sssp', 'bc', 'tc']
    run_command,         # Execute shell commands
    get_timestamp,       # Formatted timestamps
)

lib/features.py - Graph Features

Graph feature computation and system utilities:

from scripts.lib.features import (
    # Graph type detection
    detect_graph_type,
    compute_extended_features,
    
    # System utilities
    get_available_memory_gb,
    get_num_threads,
    estimate_graph_memory_gb,
)

# Compute graph features
features = compute_extended_features("graph.mtx")
# Returns: {modularity, density, avg_degree, degree_variance, clustering_coefficient, ...}

# Detect graph type
graph_type = detect_graph_type(features)  # "social", "web", "road", etc.

lib/dependencies.py - System Dependencies

Automatic system dependency detection and installation:

from scripts.lib.dependencies import (
    check_dependencies,      # Check all required dependencies
    install_dependencies,    # Install missing dependencies (needs sudo)
    install_boost_158,       # Download and compile Boost 1.58.0
    check_boost_158,         # Check if Boost 1.58.0 is installed
    detect_platform,         # Detect OS and package manager
    get_package_manager,     # Get system package manager commands
)

# Check dependencies
status = check_dependencies()
# Returns dict with: g++, boost, numa, tcmalloc, python versions and status

# Install missing dependencies (requires sudo)
install_dependencies()

# Install Boost 1.58.0 for RabbitOrder
install_boost_158()  # Downloads, compiles with bootstrap/b2, installs to /opt/boost_1_58_0

# Check Boost 1.58 specifically
version = check_boost_158()  # Returns version string or None

lib/download.py - Graph Downloading

Download graphs from SuiteSparse:

from scripts.lib.download import (
    DOWNLOAD_GRAPHS_SMALL,   # 16 small graphs
    DOWNLOAD_GRAPHS_MEDIUM,  # 28 medium graphs
    download_graphs,
    get_catalog_stats,
)

# Download small graphs
download_graphs(DOWNLOAD_GRAPHS_SMALL, output_dir="./graphs")

# Get catalog statistics
stats = get_catalog_stats()
print(f"Total graphs: {stats['total']}, Total size: {stats['total_size_gb']:.1f} GB")

lib/reorder.py - Reordering

Generate vertex reorderings:

from scripts.lib.reorder import (
    generate_reorderings,
    generate_reorderings_with_variants,
    load_label_maps_index,
)

# Generate reorderings for all algorithms
results = generate_reorderings(graphs, algorithms, bin_dir="bench/bin")

# Load existing label maps
label_maps = load_label_maps_index("results")

lib/benchmark.py - Benchmarking

Run performance benchmarks:

from scripts.lib.benchmark import (
    run_benchmark,
    run_benchmark_suite,
    parse_benchmark_output,
)

# Run single benchmark
result = run_benchmark(graph_path, algorithm_id, benchmark="pr", bin_dir="bench/bin")

# Run full suite
results = run_benchmark_suite(graphs, algorithms, benchmarks=['pr', 'bfs'])

lib/cache.py - Cache Simulation

Run cache simulations:

from scripts.lib.cache import (
    run_cache_simulations,
    get_cache_stats_summary,
)

# Run simulations
results = run_cache_simulations(graphs, algorithms, benchmarks=['pr'])

# Get summary statistics
summary = get_cache_stats_summary(results)

lib/weights.py - Weight Management

Type-based weight management for AdaptiveOrder:

from scripts.lib.weights import (
    assign_graph_type,
    update_type_weights_incremental,
    get_best_algorithm_for_type,
    load_type_registry,
)

# Assign graph to a type based on features
type_name, is_new = assign_graph_type("web-Stanford", features)

# Update weights incrementally
update_type_weights_incremental(type_name, algorithm_name, benchmark, speedup)

# Get best algorithm for a type
best_algo = get_best_algorithm_for_type(type_name, benchmark="pr")

lib/training.py - ML Training

Train adaptive weights:

from scripts.lib.training import (
    train_adaptive_weights_iterative,
    train_adaptive_weights_large_scale,
)

# Iterative training
result = train_adaptive_weights_iterative(
    graphs=graphs,
    bin_dir="bench/bin",
    target_accuracy=0.85,
    max_iterations=10
)
print(f"Final accuracy: {result.final_accuracy:.2%}")

lib/analysis.py - Adaptive Analysis

Analyze adaptive ordering:

from scripts.lib.analysis import (
    analyze_adaptive_order,
    compare_adaptive_vs_fixed,
    run_subcommunity_brute_force,
)

# Analyze adaptive ordering
results = analyze_adaptive_order(graphs, bin_dir="bench/bin")

# Compare adaptive vs fixed algorithms
comparison = compare_adaptive_vs_fixed(graphs, fixed_algorithms=[7, 15, 16])

lib/progress.py - Progress Tracking

Visual progress tracking:

from scripts.lib.progress import ProgressTracker

progress = ProgressTracker()
progress.banner("EXPERIMENT", "Running GraphBrew benchmarks")
progress.phase_start("REORDERING", "Generating vertex reorderings")
progress.info("Processing graph: web-Stanford")
progress.success("Completed 10/15 graphs")
progress.phase_end("Reordering complete")

Custom Pipeline Example

Create custom experiment pipelines using lib/phases.py:

#!/usr/bin/env python3
"""Custom GraphBrew pipeline example."""

import sys
sys.path.insert(0, "scripts")

from lib.phases import PhaseConfig, run_reorder_phase, run_benchmark_phase
from lib.types import GraphInfo
from lib.progress import ProgressTracker

# Discover graphs
graphs = [
    GraphInfo(name="web-Stanford", path="graphs/web-Stanford/web-Stanford.mtx",
              size_mb=5.2, nodes=281903, edges=2312497)
]

# Select algorithms
algorithms = [0, 7, 15, 16]  # ORIGINAL, HUBCLUSTERDBG, LeidenOrder, LeidenDendrogram

# Create configuration
config = PhaseConfig(
    benchmarks=['pr', 'bfs'],
    trials=3,
    progress=ProgressTracker()
)

# Run phases
reorder_results, label_maps = run_reorder_phase(graphs, algorithms, config)
benchmark_results = run_benchmark_phase(graphs, algorithms, label_maps, config)

# Print results
for r in benchmark_results:
    if r.success:
        print(f"{r.graph} / {r.algorithm_name} / {r.benchmark}: {r.avg_time:.4f}s")

See scripts/examples/custom_pipeline.py for a complete example.


Output Structure

results/
├── mappings/                 # Pre-generated label mappings
│   ├── index.json            # Mapping index
│   └── {graph_name}/         # Per-graph mappings
│       ├── HUBCLUSTERDBG.lo  # Label order file
│       └── HUBCLUSTERDBG.time # Reorder timing
├── reorder_*.json            # Reorder results
├── benchmark_*.json          # Benchmark results
├── cache_*.json              # Cache simulation results
└── logs/                     # Execution logs

scripts/weights/              # Type-based weights
├── active/                   # C++ reads from here
│   ├── type_registry.json    # Graph → type mapping
│   ├── type_0.json           # Cluster 0 weights
│   └── type_N.json           # Additional clusters
├── merged/                   # Accumulated from all runs
└── runs/                     # Historical snapshots

Installation

cd scripts
pip install -r requirements.txt

requirements.txt

# Core dependencies - NONE REQUIRED
# All benchmark scripts use only Python 3.8+ standard library

# Optional: For extended analysis and visualization (uncomment if needed)
# numpy>=1.20.0        # For statistical analysis
# pandas>=1.3.0        # For data manipulation  
# matplotlib>=3.4.0    # For plotting results
# scipy>=1.7.0         # For correlation analysis
# networkx>=2.6        # For graph analysis

Troubleshooting

Import Errors

pip install -r scripts/requirements.txt
python3 --version  # Should be 3.8+

Binary Not Found

make all
make sim  # For cache simulation

Permission Denied

chmod +x bench/bin/*
chmod +x bench/bin_sim/*

Next Steps


← Back to Home | Code Architecture →

Clone this wiki locally