FAQ

Frequently Asked Questions (FAQ)

Common questions and answers about GraphBrew.

General Questions

What is GraphBrew?

GraphBrew is a graph processing benchmark framework that combines:

18 vertex reordering algorithms (IDs 0-17) for cache optimization
5 automated benchmarks (PageRank, BFS, CC, SSSP, BC), plus Triangle Counting available
ML-powered algorithm selection via AdaptiveOrder
Leiden community detection integration

Who should use GraphBrew?

Researchers studying graph algorithms and cache optimization
Engineers optimizing graph processing pipelines
Students learning about graph algorithms
Data scientists working with network data

What makes GraphBrew different?

Comprehensive: 18 reordering algorithms in one framework
ML-powered: AdaptiveOrder learns which algorithm works best
Modern: Leiden community detection integration
Practical: Based on GAP Benchmark Suite standards

Installation Questions

What are the requirements?

Linux or macOS
GCC 7+ with C++17 support
Make
Python 3.8+ (optional, for scripts - no pip dependencies required)
At least 4GB RAM (more for large graphs)

How do I install on Ubuntu?

sudo apt-get update
sudo apt-get install build-essential git
git clone https://github.com/UVA-LavaLab/GraphBrew.git
cd GraphBrew
make all

How do I install on macOS?

xcode-select --install
brew install gcc
git clone https://github.com/UVA-LavaLab/GraphBrew.git
cd GraphBrew
make all CXX=g++-13

The build fails. What should I check?

GCC version: g++ --version (need 7+)
C++17 support: g++ -std=c++17 --version
OpenMP: echo '#include <omp.h>' | g++ -fopenmp -x c++ - -c

See Installation for detailed troubleshooting.

Usage Questions

How do I run a simple benchmark?

./bench/bin/pr -f graph.el -s -n 3

What does each option mean?

Option	Meaning
`-f graph.el`	Input file
`-s`	Make undirected (symmetrize)
`-n 3`	Run 3 trials
`-o 7`	Use algorithm 7 (HUBCLUSTERDBG)

Which reordering algorithm should I use?

Situation	Recommendation
Don't know	`-o 14` (AdaptiveOrder)
Social network	`-o 15` (LeidenOrder)
General purpose	`-o 7` (HUBCLUSTERDBG)
Large graph	`-o 17` (LeidenCSR)
Baseline	`-o 0` (no reordering)

How do I know which algorithm is best for my graph?

Run multiple algorithms and compare:

for algo in 0 7 14 15 17; do
    echo "=== Algorithm $algo ==="
    ./bench/bin/pr -f graph.el -s -o $algo -n 3
done

Or use AdaptiveOrder (-o 14) to auto-select.

What graph formats are supported?

Format	Extension	Example
Edge list	`.el`	`0 1`
Weighted	`.wel`	`0 1 2.5`
Matrix Market	`.mtx`	Standard MTX
DIMACS	`.gr`	Road networks

See Supported-Graph-Formats for details.

Performance Questions

Why is reordering slow?

Reordering is preprocessing that pays off over multiple algorithm runs:

Operation	Time
Load graph	1x
Reorder	0.5-2x
Benchmark	0.7-0.9x (faster!)

For repeated analyses, reorder once, save, reuse.

How much speedup should I expect?

Typical improvements:

Graph Type	PageRank	BFS	TC
Social	1.2-1.5x	1.1-1.3x	1.5-3x
Web	1.3-1.8x	1.2-1.4x	2-4x
Road	1.0-1.1x	1.0-1.1x	1.1-1.3x

Why is my graph loading slowly?

Large file: Use binary format

./bench/bin/converter -f graph.el -s -b graph.sg
./bench/bin/pr -f graph.sg -n 3

Text parsing: MTX/EL requires parsing; binary is instant
Memory: Ensure sufficient RAM

How can I make benchmarks faster?

Use binary graphs for repeated runs
Tune thread count: export OMP_NUM_THREADS=8
Use NUMA binding: numactl --cpunodebind=0
Reduce trials during development: -n 1

Algorithm Questions

What is Leiden community detection?

Leiden is a community detection algorithm that finds densely connected groups of vertices. It improves on the popular Louvain algorithm with:

Guaranteed connected communities
Better quality partitions
Faster convergence

GraphBrew uses Leiden to guide reordering decisions.

How does AdaptiveOrder work?

Detects communities using Leiden
Computes features for each community
Uses ML perceptron to select best algorithm per community
Applies different reorderings to different parts of the graph

See AdaptiveOrder-ML for details.

Why are some perceptron weights 0 or 1.0?

Weight fields remain at 0 or default 1.0 when:

Cache simulation was skipped (--skip-cache) - no cache impact data
Graph features weren't computed - no topology metrics
No benchmark data - no per-benchmark weights

Fix: Run the comprehensive fill-weights mode:

python3 scripts/graphbrew_experiment.py --fill-weights --graphs small --max-graphs 5 --trials 2

This populates all fields including cache_l1/l2/l3_impact, w_clustering_coeff, w_diameter, and benchmark_weights.

Where are the trained weights saved?

Weights are saved using an auto-clustering system:

Auto-clustered type weights (primary):

scripts/weights/active/
├── type_registry.json    # Maps graph names → type IDs + cluster centroids
├── type_0.json           # Cluster 0 weights
├── type_1.json           # Cluster 1 weights
└── type_N.json           # Additional clusters as needed

Loading priority (C++ runtime):

Environment variable PERCEPTRON_WEIGHTS_FILE (if set)
Best matching type file via Euclidean distance to centroids (e.g., type_0.json)
Semantic type fallback (if type files don't exist)
Hardcoded defaults

How does graph type matching work?

The auto-clustering system:

Extracts 9 features per graph: modularity, log_nodes, log_edges, density, avg_degree, degree_variance, hub_concentration, clustering_coefficient, community_count
Clusters similar graphs using Euclidean distance to centroids
Stores centroids in type_registry.json
At runtime, finds best matching cluster based on feature similarity

Properties are computed during --fill-weights Phase 0 and cached in results/graph_properties_cache.json.

What's the difference between LeidenOrder and LeidenCSR?

Algorithm	Approach
LeidenOrder (15)	Basic Leiden → contiguous community ordering
LeidenDendrogram (16)	Leiden + dendrogram-based traversal variants
LeidenCSR (17)	Optimized Leiden with multiple variants (dfs/bfs/hubsort/fast/modularity)

LeidenCSR with "hubsort" or "fast" variant is recommended for large graphs.

When should I use DBG vs HUBCLUSTER?

Algorithm	Best For
DBG	Power-law graphs with clear hot/cold separation
HUBCLUSTER	Graphs where hubs connect to each other
HUBCLUSTERDBG	Combines both - good general choice

Troubleshooting

"Segmentation fault" when loading graph

Check file exists: ls -la graph.el
Check format: head -5 graph.el
Check indexing: Vertices should start at 0
Check memory: free -h

"Invalid graph format" error

# Check for issues
head -5 graph.el        # Should be: node1 node2
file graph.el           # Should be: ASCII text
dos2unix graph.el       # Fix Windows line endings

Results vary between runs

This is normal! Use more trials (-n 10) and report averages.

Reduce variance:

Disable CPU frequency scaling
Run on dedicated machine
Use numactl for memory binding

Algorithm X isn't giving speedup

Not all algorithms help all graphs:

Road networks often don't benefit much
Small graphs have minimal cache effects
Already well-ordered graphs may not improve

Try different algorithms or use AdaptiveOrder.

Python scripts not working

# Core scripts need only Python 3.8+ standard library (no pip install needed)
# Optional: Install for extended visualization
pip install numpy matplotlib pandas

# Check Python version
python3 --version  # Need 3.8+

# Check path and test script
python3 scripts/graphbrew_experiment.py --help

Development Questions

How do I add a new reordering algorithm?

See Adding-New-Algorithms for a complete guide:

Add enum value in builder.h
Implement reorder function
Add switch case
(Optional) Add perceptron weights

How do I add a new benchmark?

See Adding-New-Benchmarks for a complete guide:

Create bench/src/my_algo.cc
Implement algorithm
Add to Makefile
Test

How do I contribute?

Fork the repository
Create a feature branch
Make changes with tests
Submit a pull request

See CONTRIBUTING.md for guidelines.

What's the code license?

GraphBrew is released under the MIT License. See LICENSE.

Data Questions

Where can I download graphs?

Source	URL	Formats
SNAP	snap.stanford.edu/data	Edge list
SuiteSparse	sparse.tamu.edu	MTX
Network Repository	networkrepository.com	Various
KONECT	konect.cc	Various

How do I convert my graph format?

# CSV to edge list
cat graph.csv | tr ',' ' ' | tail -n +2 > graph.el

# MTX to edge list (1-indexed to 0-indexed)
grep -v "^%" graph.mtx | tail -n +2 | awk '{print $1-1, $2-1}' > graph.el

What's the maximum graph size?

Limited by RAM:

~16 bytes per edge
1B edges ≈ 16GB RAM
Larger graphs need out-of-core processing (not supported)

Still Have Questions?

Check Home for documentation overview
Review Troubleshooting for common issues
Open a GitHub issue for bugs
Start a discussion for questions

← Back to Home

FAQ

Frequently Asked Questions (FAQ)

General Questions

What is GraphBrew?

Who should use GraphBrew?

What makes GraphBrew different?

Installation Questions

What are the requirements?

How do I install on Ubuntu?

How do I install on macOS?

The build fails. What should I check?

Usage Questions

How do I run a simple benchmark?

What does each option mean?

Which reordering algorithm should I use?

How do I know which algorithm is best for my graph?

What graph formats are supported?

Performance Questions

Why is reordering slow?

How much speedup should I expect?

Why is my graph loading slowly?

How can I make benchmarks faster?

Algorithm Questions

What is Leiden community detection?

How does AdaptiveOrder work?

Why are some perceptron weights 0 or 1.0?

Where are the trained weights saved?

How does graph type matching work?

What's the difference between LeidenOrder and LeidenCSR?

When should I use DBG vs HUBCLUSTER?

Troubleshooting

"Segmentation fault" when loading graph

"Invalid graph format" error

Results vary between runs

Algorithm X isn't giving speedup

Python scripts not working

Development Questions

How do I add a new reordering algorithm?

How do I add a new benchmark?

How do I contribute?

What's the code license?

Data Questions

Where can I download graphs?

How do I convert my graph format?

What's the maximum graph size?

Still Have Questions?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally