-
Notifications
You must be signed in to change notification settings - Fork 2
FAQ
Common questions and answers about GraphBrew.
GraphBrew is a graph processing benchmark framework that combines:
- 18 vertex reordering algorithms (IDs 0-17) for cache optimization
- 5 automated benchmarks (PageRank, BFS, CC, SSSP, BC), plus Triangle Counting available
- ML-powered algorithm selection via AdaptiveOrder
- Leiden community detection integration
- Researchers studying graph algorithms and cache optimization
- Engineers optimizing graph processing pipelines
- Students learning about graph algorithms
- Data scientists working with network data
- Comprehensive: 18 reordering algorithms in one framework
- ML-powered: AdaptiveOrder learns which algorithm works best
- Modern: Leiden community detection integration
- Practical: Based on GAP Benchmark Suite standards
- Linux or macOS
- GCC 7+ with C++17 support
- Make
- Python 3.8+ (optional, for scripts - no pip dependencies required)
- At least 4GB RAM (more for large graphs)
sudo apt-get update
sudo apt-get install build-essential git
git clone https://github.com/UVA-LavaLab/GraphBrew.git
cd GraphBrew
make allxcode-select --install
brew install gcc
git clone https://github.com/UVA-LavaLab/GraphBrew.git
cd GraphBrew
make all CXX=g++-13- GCC version:
g++ --version(need 7+) - C++17 support:
g++ -std=c++17 --version - OpenMP:
echo '#include <omp.h>' | g++ -fopenmp -x c++ - -c
See Installation for detailed troubleshooting.
./bench/bin/pr -f graph.el -s -n 3| Option | Meaning |
|---|---|
-f graph.el |
Input file |
-s |
Make undirected (symmetrize) |
-n 3 |
Run 3 trials |
-o 7 |
Use algorithm 7 (HUBCLUSTERDBG) |
| Situation | Recommendation |
|---|---|
| Don't know |
-o 14 (AdaptiveOrder) |
| Social network |
-o 15 (LeidenOrder) |
| General purpose |
-o 7 (HUBCLUSTERDBG) |
| Large graph |
-o 17 (LeidenCSR) |
| Baseline |
-o 0 (no reordering) |
Run multiple algorithms and compare:
for algo in 0 7 14 15 17; do
echo "=== Algorithm $algo ==="
./bench/bin/pr -f graph.el -s -o $algo -n 3
doneOr use AdaptiveOrder (-o 14) to auto-select.
| Format | Extension | Example |
|---|---|---|
| Edge list | .el |
0 1 |
| Weighted | .wel |
0 1 2.5 |
| Matrix Market | .mtx |
Standard MTX |
| DIMACS | .gr |
Road networks |
See Supported-Graph-Formats for details.
Reordering is preprocessing that pays off over multiple algorithm runs:
| Operation | Time |
|---|---|
| Load graph | 1x |
| Reorder | 0.5-2x |
| Benchmark | 0.7-0.9x (faster!) |
For repeated analyses, reorder once, save, reuse.
Typical improvements:
| Graph Type | PageRank | BFS | TC |
|---|---|---|---|
| Social | 1.2-1.5x | 1.1-1.3x | 1.5-3x |
| Web | 1.3-1.8x | 1.2-1.4x | 2-4x |
| Road | 1.0-1.1x | 1.0-1.1x | 1.1-1.3x |
-
Large file: Use binary format
./bench/bin/converter -f graph.el -s -b graph.sg ./bench/bin/pr -f graph.sg -n 3
-
Text parsing: MTX/EL requires parsing; binary is instant
-
Memory: Ensure sufficient RAM
- Use binary graphs for repeated runs
-
Tune thread count:
export OMP_NUM_THREADS=8 -
Use NUMA binding:
numactl --cpunodebind=0 -
Reduce trials during development:
-n 1
Leiden is a community detection algorithm that finds densely connected groups of vertices. It improves on the popular Louvain algorithm with:
- Guaranteed connected communities
- Better quality partitions
- Faster convergence
GraphBrew uses Leiden to guide reordering decisions.
- Detects communities using Leiden
- Computes features for each community
- Uses ML perceptron to select best algorithm per community
- Applies different reorderings to different parts of the graph
See AdaptiveOrder-ML for details.
Weight fields remain at 0 or default 1.0 when:
-
Cache simulation was skipped (
--skip-cache) - no cache impact data - Graph features weren't computed - no topology metrics
- No benchmark data - no per-benchmark weights
Fix: Run the comprehensive fill-weights mode:
python3 scripts/graphbrew_experiment.py --fill-weights --graphs small --max-graphs 5 --trials 2This populates all fields including cache_l1/l2/l3_impact, w_clustering_coeff, w_diameter, and benchmark_weights.
Weights are saved using an auto-clustering system:
Auto-clustered type weights (primary):
scripts/weights/active/
├── type_registry.json # Maps graph names → type IDs + cluster centroids
├── type_0.json # Cluster 0 weights
├── type_1.json # Cluster 1 weights
└── type_N.json # Additional clusters as needed
Loading priority (C++ runtime):
- Environment variable
PERCEPTRON_WEIGHTS_FILE(if set) - Best matching type file via Euclidean distance to centroids (e.g.,
type_0.json) - Semantic type fallback (if type files don't exist)
- Hardcoded defaults
The auto-clustering system:
- Extracts 9 features per graph: modularity, log_nodes, log_edges, density, avg_degree, degree_variance, hub_concentration, clustering_coefficient, community_count
- Clusters similar graphs using Euclidean distance to centroids
-
Stores centroids in
type_registry.json - At runtime, finds best matching cluster based on feature similarity
Properties are computed during --fill-weights Phase 0 and cached in results/graph_properties_cache.json.
| Algorithm | Approach |
|---|---|
| LeidenOrder (15) | Basic Leiden → contiguous community ordering |
| LeidenDendrogram (16) | Leiden + dendrogram-based traversal variants |
| LeidenCSR (17) | Optimized Leiden with multiple variants (dfs/bfs/hubsort/fast/modularity) |
LeidenCSR with "hubsort" or "fast" variant is recommended for large graphs.
| Algorithm | Best For |
|---|---|
| DBG | Power-law graphs with clear hot/cold separation |
| HUBCLUSTER | Graphs where hubs connect to each other |
| HUBCLUSTERDBG | Combines both - good general choice |
-
Check file exists:
ls -la graph.el -
Check format:
head -5 graph.el - Check indexing: Vertices should start at 0
-
Check memory:
free -h
# Check for issues
head -5 graph.el # Should be: node1 node2
file graph.el # Should be: ASCII text
dos2unix graph.el # Fix Windows line endingsThis is normal! Use more trials (-n 10) and report averages.
Reduce variance:
- Disable CPU frequency scaling
- Run on dedicated machine
- Use
numactlfor memory binding
Not all algorithms help all graphs:
- Road networks often don't benefit much
- Small graphs have minimal cache effects
- Already well-ordered graphs may not improve
Try different algorithms or use AdaptiveOrder.
# Core scripts need only Python 3.8+ standard library (no pip install needed)
# Optional: Install for extended visualization
pip install numpy matplotlib pandas
# Check Python version
python3 --version # Need 3.8+
# Check path and test script
python3 scripts/graphbrew_experiment.py --helpSee Adding-New-Algorithms for a complete guide:
- Add enum value in
builder.h - Implement reorder function
- Add switch case
- (Optional) Add perceptron weights
See Adding-New-Benchmarks for a complete guide:
- Create
bench/src/my_algo.cc - Implement algorithm
- Add to Makefile
- Test
- Fork the repository
- Create a feature branch
- Make changes with tests
- Submit a pull request
See CONTRIBUTING.md for guidelines.
GraphBrew is released under the MIT License. See LICENSE.
| Source | URL | Formats |
|---|---|---|
| SNAP | snap.stanford.edu/data | Edge list |
| SuiteSparse | sparse.tamu.edu | MTX |
| Network Repository | networkrepository.com | Various |
| KONECT | konect.cc | Various |
# CSV to edge list
cat graph.csv | tr ',' ' ' | tail -n +2 > graph.el
# MTX to edge list (1-indexed to 0-indexed)
grep -v "^%" graph.mtx | tail -n +2 | awk '{print $1-1, $2-1}' > graph.elLimited by RAM:
- ~16 bytes per edge
- 1B edges ≈ 16GB RAM
- Larger graphs need out-of-core processing (not supported)
- Check Home for documentation overview
- Review Troubleshooting for common issues
- Open a GitHub issue for bugs
- Start a discussion for questions