Python Scripts

Python Scripts Guide

Documentation for all Python tools in the GraphBrew framework.

Overview

The scripts folder contains a modular library (lib/) and the main orchestration script:

scripts/
├── graphbrew_experiment.py      # ⭐ MAIN: Orchestration script (~3100 lines)
├── requirements.txt             # Python dependencies
│
├── lib/                         # 📦 Modular library (~11000 lines total)
│   ├── __init__.py              # Module exports
│   ├── types.py                 # Data classes (GraphInfo, BenchmarkResult, etc.)
│   ├── phases.py                # Phase orchestration (run_reorder_phase, etc.)
│   ├── utils.py                 # Core utilities (ALGORITHMS, run_command, etc.)
│   ├── features.py              # Graph feature computation & system utilities
│   ├── dependencies.py          # System dependency detection & installation
│   ├── download.py              # Graph downloading from SuiteSparse
│   ├── build.py                 # Binary compilation utilities
│   ├── reorder.py               # Vertex reordering generation
│   ├── benchmark.py             # Performance benchmark execution
│   ├── cache.py                 # Cache simulation analysis
│   ├── weights.py               # Type-based weight management
│   ├── weight_merger.py         # Cross-run weight consolidation
│   ├── training.py              # ML weight training
│   ├── analysis.py              # Adaptive order analysis
│   ├── progress.py              # Progress tracking & reporting
│   └── results.py               # Result file I/O
│
├── test/                        # Test suite
│   ├── test_weight_flow.py      # Weight generation/loading tests
│   ├── test_weight_merger.py    # Merger consolidation tests
│   └── test_fill_adaptive.py    # Fill-weights pipeline tests
│
├── weights/                     # Type-based weight files
│   ├── active/                  # C++ reads from here (working copy)
│   │   ├── type_registry.json   # Maps graphs → types + centroids
│   │   ├── type_0.json          # Cluster 0 weights
│   │   └── type_N.json          # Additional clusters
│   ├── merged/                  # Accumulated from all runs
│   └── runs/                    # Historical snapshots
│
├── examples/                    # Example scripts
│   ├── batch_process.py         # Batch processing example
│   ├── compare_algorithms.py    # Algorithm comparison example
│   ├── custom_pipeline.py       # Custom phase-based pipeline example
│   └── quick_test.py            # Quick testing example
└── requirements.txt             # Python dependencies (optional)

⭐ graphbrew_experiment.py - Main Orchestration

The main script provides orchestration over the lib/ modules. It handles argument parsing and calls the appropriate phase functions.

Quick Start

# Full pipeline: download → build → experiment → weights
python3 scripts/graphbrew_experiment.py --full --download-size SMALL

# See all options
python3 scripts/graphbrew_experiment.py --help

Key Features

Feature	Description
Graph Download	Downloads from SuiteSparse collection (87 graphs available)
Auto Build	Compiles binaries if missing
Memory Management	Automatically skips graphs exceeding RAM limits
Label Maps	Pre-generates reordering maps for consistency
Reordering	Tests all 18 algorithms
Benchmarks	PR, BFS, CC, SSSP, BC, TC
Cache Simulation	L1/L2/L3 hit rate analysis
Perceptron Training	Generates weights for AdaptiveOrder
Brute-Force Validation	Compares adaptive vs all algorithms

Command-Line Options

Dependency Management

Option	Description
`--check-deps`	Check system dependencies (g++, boost, numa, etc.)
`--install-deps`	Install missing system dependencies (requires sudo)
`--install-boost`	Download, compile, and install Boost 1.58.0 to /opt/boost_1_58_0

Pipeline Control

Option	Description
`--full`	Run complete pipeline (download → build → experiment → weights)
`--download-only`	Only download graphs
`--download-size`	SMALL (16), MEDIUM (28), LARGE (37), XLARGE (6), ALL (87 graphs)
`--clean`	Clean results (keep graphs/weights)
`--clean-all`	Full reset for fresh start

Memory Management

Option	Description
`--max-memory GB`	Maximum RAM (GB) for graph processing
`--auto-memory`	Automatically detect available RAM (uses 80% of total)

Disk Space Management

Option	Description
`--max-disk GB`	Maximum disk space (GB) for downloads
`--auto-disk`	Automatically limit downloads to available disk space

Experiment Options

Option	Description
`--phase`	Run specific phase: all, reorder, benchmark, cache, weights, adaptive
`--graphs`	Graph size: all, small, medium, large, custom
`--key-only`	Only test key algorithms (faster)
`--skip-cache`	Skip cache simulations
`--brute-force`	Run brute-force validation

Label Mapping (Consistent Reordering)

Option	Description
`--generate-maps`	Pre-generate .lo mapping files
`--use-maps`	Use pre-generated label maps

Training Options

Option	Description
`--train-adaptive`	Run iterative training feedback loop
`--train-large`	Run large-scale batched training
`--target-accuracy`	Target accuracy % (default: 80)
`--fill-weights`	Fill ALL weight fields with comprehensive analysis

Examples

# One-click full experiment
python3 scripts/graphbrew_experiment.py --full --download-size SMALL

# Quick test with key algorithms
python3 scripts/graphbrew_experiment.py --graphs small --key-only

# Pre-generate label maps
python3 scripts/graphbrew_experiment.py --generate-maps --graphs small

# Fill ALL weight fields
python3 scripts/graphbrew_experiment.py --fill-weights --graphs small --max-graphs 5

# Clean and start fresh
python3 scripts/graphbrew_experiment.py --clean-all --full --download-size SMALL

📦 lib/ Module Reference

The lib/ folder contains modular, reusable components. Each module can be used independently or via the phase orchestration system.

lib/types.py - Data Classes

Central type definitions used across all modules:

from scripts.lib.types import GraphInfo, BenchmarkResult, CacheResult, ReorderResult

# GraphInfo - Graph metadata
GraphInfo(name="web-Stanford", path="graphs/web-Stanford/web-Stanford.mtx", 
          size_mb=5.2, nodes=281903, edges=2312497)

# BenchmarkResult - Benchmark execution result
BenchmarkResult(graph="web-Stanford", algorithm_id=7, algorithm_name="HUBCLUSTERDBG",
                benchmark="pr", avg_time=0.234, speedup=1.45, success=True)

# CacheResult - Cache simulation result
CacheResult(graph="web-Stanford", algorithm_id=7, algorithm_name="HUBCLUSTERDBG",
            benchmark="pr", l1_miss_rate=0.12, l2_miss_rate=0.08, l3_miss_rate=0.02)

# ReorderResult - Reordering result
ReorderResult(graph="web-Stanford", algorithm_id=7, algorithm_name="HUBCLUSTERDBG",
              time_seconds=1.23, mapping_file="mappings/web-Stanford/HUBCLUSTERDBG.lo")

lib/phases.py - Phase Orchestration

High-level phase functions for building custom pipelines:

from scripts.lib.phases import (
    PhaseConfig,
    run_reorder_phase,
    run_benchmark_phase,
    run_cache_phase,
    run_weights_phase,
    run_full_pipeline,
)

# Create configuration
config = PhaseConfig(
    benchmarks=['pr', 'bfs', 'cc'],
    trials=3,
    skip_slow=True
)

# Run individual phases
reorder_results, label_maps = run_reorder_phase(graphs, algorithms, config)
benchmark_results = run_benchmark_phase(graphs, algorithms, label_maps, config)

# Or run full pipeline
results = run_full_pipeline(graphs, algorithms, config, phases=['reorder', 'benchmark'])

lib/utils.py - Core Utilities

Constants and shared utilities:

from scripts.lib.utils import (
    ALGORITHMS,          # {0: "ORIGINAL", 1: "RANDOM", ...}
    BENCHMARKS,          # ['pr', 'bfs', 'cc', 'sssp', 'bc', 'tc']
    run_command,         # Execute shell commands
    get_timestamp,       # Formatted timestamps
)

lib/features.py - Graph Features

Graph feature computation and system utilities:

from scripts.lib.features import (
    # Graph type detection
    detect_graph_type,
    compute_extended_features,
    
    # System utilities
    get_available_memory_gb,
    get_num_threads,
    estimate_graph_memory_gb,
)

# Compute graph features
features = compute_extended_features("graph.mtx")
# Returns: {modularity, density, avg_degree, degree_variance, clustering_coefficient, ...}

# Detect graph type
graph_type = detect_graph_type(features)  # "social", "web", "road", etc.

lib/dependencies.py - System Dependencies

Automatic system dependency detection and installation:

from scripts.lib.dependencies import (
    check_dependencies,      # Check all required dependencies
    install_dependencies,    # Install missing dependencies (needs sudo)
    install_boost_158,       # Download and compile Boost 1.58.0
    check_boost_158,         # Check if Boost 1.58.0 is installed
    detect_platform,         # Detect OS and package manager
    get_package_manager,     # Get system package manager commands
)

# Check dependencies
status = check_dependencies()
# Returns dict with: g++, boost, numa, tcmalloc, python versions and status

# Install missing dependencies (requires sudo)
install_dependencies()

# Install Boost 1.58.0 for RabbitOrder
install_boost_158()  # Downloads, compiles with bootstrap/b2, installs to /opt/boost_1_58_0

# Check Boost 1.58 specifically
version = check_boost_158()  # Returns version string or None

lib/download.py - Graph Downloading

Download graphs from SuiteSparse:

from scripts.lib.download import (
    DOWNLOAD_GRAPHS_SMALL,   # 16 small graphs
    DOWNLOAD_GRAPHS_MEDIUM,  # 28 medium graphs
    download_graphs,
    get_catalog_stats,
)

# Download small graphs
download_graphs(DOWNLOAD_GRAPHS_SMALL, output_dir="./graphs")

# Get catalog statistics
stats = get_catalog_stats()
print(f"Total graphs: {stats['total']}, Total size: {stats['total_size_gb']:.1f} GB")

lib/reorder.py - Reordering

Generate vertex reorderings:

from scripts.lib.reorder import (
    generate_reorderings,
    generate_reorderings_with_variants,
    load_label_maps_index,
)

# Generate reorderings for all algorithms
results = generate_reorderings(graphs, algorithms, bin_dir="bench/bin")

# Load existing label maps
label_maps = load_label_maps_index("results")

lib/benchmark.py - Benchmarking

Run performance benchmarks:

from scripts.lib.benchmark import (
    run_benchmark,
    run_benchmark_suite,
    parse_benchmark_output,
)

# Run single benchmark
result = run_benchmark(graph_path, algorithm_id, benchmark="pr", bin_dir="bench/bin")

# Run full suite
results = run_benchmark_suite(graphs, algorithms, benchmarks=['pr', 'bfs'])

lib/cache.py - Cache Simulation

Run cache simulations:

from scripts.lib.cache import (
    run_cache_simulations,
    get_cache_stats_summary,
)

# Run simulations
results = run_cache_simulations(graphs, algorithms, benchmarks=['pr'])

# Get summary statistics
summary = get_cache_stats_summary(results)

lib/weights.py - Weight Management

Type-based weight management for AdaptiveOrder:

from scripts.lib.weights import (
    assign_graph_type,
    update_type_weights_incremental,
    get_best_algorithm_for_type,
    load_type_registry,
)

# Assign graph to a type based on features
type_name, is_new = assign_graph_type("web-Stanford", features)

# Update weights incrementally
update_type_weights_incremental(type_name, algorithm_name, benchmark, speedup)

# Get best algorithm for a type
best_algo = get_best_algorithm_for_type(type_name, benchmark="pr")

lib/training.py - ML Training

Train adaptive weights:

from scripts.lib.training import (
    train_adaptive_weights_iterative,
    train_adaptive_weights_large_scale,
)

# Iterative training
result = train_adaptive_weights_iterative(
    graphs=graphs,
    bin_dir="bench/bin",
    target_accuracy=0.85,
    max_iterations=10
)
print(f"Final accuracy: {result.final_accuracy:.2%}")

lib/analysis.py - Adaptive Analysis

Analyze adaptive ordering:

from scripts.lib.analysis import (
    analyze_adaptive_order,
    compare_adaptive_vs_fixed,
    run_subcommunity_brute_force,
)

# Analyze adaptive ordering
results = analyze_adaptive_order(graphs, bin_dir="bench/bin")

# Compare adaptive vs fixed algorithms
comparison = compare_adaptive_vs_fixed(graphs, fixed_algorithms=[7, 15, 16])

lib/progress.py - Progress Tracking

Visual progress tracking:

from scripts.lib.progress import ProgressTracker

progress = ProgressTracker()
progress.banner("EXPERIMENT", "Running GraphBrew benchmarks")
progress.phase_start("REORDERING", "Generating vertex reorderings")
progress.info("Processing graph: web-Stanford")
progress.success("Completed 10/15 graphs")
progress.phase_end("Reordering complete")

Custom Pipeline Example

Create custom experiment pipelines using lib/phases.py:

#!/usr/bin/env python3
"""Custom GraphBrew pipeline example."""

import sys
sys.path.insert(0, "scripts")

from lib.phases import PhaseConfig, run_reorder_phase, run_benchmark_phase
from lib.types import GraphInfo
from lib.progress import ProgressTracker

# Discover graphs
graphs = [
    GraphInfo(name="web-Stanford", path="graphs/web-Stanford/web-Stanford.mtx",
              size_mb=5.2, nodes=281903, edges=2312497)
]

# Select algorithms
algorithms = [0, 7, 15, 16]  # ORIGINAL, HUBCLUSTERDBG, LeidenOrder, LeidenDendrogram

# Create configuration
config = PhaseConfig(
    benchmarks=['pr', 'bfs'],
    trials=3,
    progress=ProgressTracker()
)

# Run phases
reorder_results, label_maps = run_reorder_phase(graphs, algorithms, config)
benchmark_results = run_benchmark_phase(graphs, algorithms, label_maps, config)

# Print results
for r in benchmark_results:
    if r.success:
        print(f"{r.graph} / {r.algorithm_name} / {r.benchmark}: {r.avg_time:.4f}s")

See scripts/examples/custom_pipeline.py for a complete example.

Output Structure

results/
├── mappings/                 # Pre-generated label mappings
│   ├── index.json            # Mapping index
│   └── {graph_name}/         # Per-graph mappings
│       ├── HUBCLUSTERDBG.lo  # Label order file
│       └── HUBCLUSTERDBG.time # Reorder timing
├── reorder_*.json            # Reorder results
├── benchmark_*.json          # Benchmark results
├── cache_*.json              # Cache simulation results
└── logs/                     # Execution logs

scripts/weights/              # Type-based weights
├── active/                   # C++ reads from here
│   ├── type_registry.json    # Graph → type mapping
│   ├── type_0.json           # Cluster 0 weights
│   └── type_N.json           # Additional clusters
├── merged/                   # Accumulated from all runs
└── runs/                     # Historical snapshots

Installation

cd scripts
pip install -r requirements.txt

requirements.txt

# Core dependencies - NONE REQUIRED
# All benchmark scripts use only Python 3.8+ standard library

# Optional: For extended analysis and visualization (uncomment if needed)
# numpy>=1.20.0        # For statistical analysis
# pandas>=1.3.0        # For data manipulation  
# matplotlib>=3.4.0    # For plotting results
# scipy>=1.7.0         # For correlation analysis
# networkx>=2.6        # For graph analysis

Troubleshooting

Import Errors

pip install -r scripts/requirements.txt
python3 --version  # Should be 3.8+

Binary Not Found

make all
make sim  # For cache simulation

Permission Denied

chmod +x bench/bin/*
chmod +x bench/bin_sim/*

Next Steps

AdaptiveOrder-ML - ML perceptron details
Running-Benchmarks - Command-line usage
Code-Architecture - Codebase structure

← Back to Home | Code Architecture →

Python Scripts

Python Scripts Guide

Overview

⭐ graphbrew_experiment.py - Main Orchestration

Quick Start

Key Features

Command-Line Options

Dependency Management

Pipeline Control

Memory Management

Disk Space Management

Experiment Options

Label Mapping (Consistent Reordering)

Training Options

Examples

📦 lib/ Module Reference

lib/types.py - Data Classes

lib/phases.py - Phase Orchestration

lib/utils.py - Core Utilities

lib/features.py - Graph Features

lib/dependencies.py - System Dependencies

lib/download.py - Graph Downloading

lib/reorder.py - Reordering

lib/benchmark.py - Benchmarking

lib/cache.py - Cache Simulation

lib/weights.py - Weight Management

lib/training.py - ML Training

lib/analysis.py - Adaptive Analysis

lib/progress.py - Progress Tracking

Custom Pipeline Example

Output Structure

Installation

requirements.txt

Troubleshooting

Import Errors

Binary Not Found

Permission Denied

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally