Sandook

Paper

Unleashing The Potential of Datacenter SSDs by Taming Performance Variability (NSDI '26)

Gohar Irfan Chaudhry, Ankit Bhardwaj, Zhenyuan Ruan, Adam Belay

Artifact Evaluation Guide (NSDI - Fall 2026)

This document (and the references within) are to guide the artifact evaluation process.

Accessing Evaluation Testbed

Note

In order to make it easier to access the hardware/software environment for evaluating the artifact, we will provide access to our own server testbed. We kindly request the authors to email us at girfan@mit.edu and ankit@cs.tufts.edu when ready to evaluate the artifact and we will promptly provide credentials and login instructions to our servers. We would appreciate if you kindly share your SSH public key with us so we can grant access.

Quick Start

Note

Information can be found in QuickStart.

Claims Evaluated

Main Result (Figure 4): Sandook achieves significant I/O throughput improvement over existing systems that tackle only a single source of SSD performance variability, while maintaining sub-millisecond tail latency. This experiment compares Sandook against:
- Static routing (FDS-style)
- Read/write isolation (Rails-style)
Different Read/Write Ratios (Figure 6): Demonstrates how Sandook is able to improve IOPS and maintain low latency under different ratios of reads/writes in the storage cluster. This experiment compares Sandook against:
- Static routing (FDS-style)

Reproducing Experiment Results

Note

Information can be found in Experiments.

Overview

Sandook is a distributed storage system that aggregates multiple NVMe SSDs into a unified, high-performance block device. It features dynamic read/write workload isolation, SSD performance model-driven scheduling, and exposes storage via a standard Linux block device interface.

Key Features

Distributed Storage: Aggregates multiple SSD servers into a single virtual disk
Read/Write Isolation: Dynamically partitions servers into read-only and write-only groups
Replication: 2-way replication for writes across servers
Performance Models: Uses profiled SSD latency-load curves to guide scheduling
Block Device Interface: Exposes storage as /dev/ublkbX via ublksrv

Architecture

┌──────────────┐     ┌────────────┐     ┌─────────────┐     ┌────────┐
│  Application │     │ Controller │     │ Disk Server │     │  NVMe  │
│      ↓       │     │  (central) │     │   (per-SSD) │────►│  SSD   │
│  Block Dev   │◄───►│ Scheduling │◄───►│  SPDK/POSIX │     │        │
│  (ublksrv)   │     │ Allocation │     │   Backend   │     └────────┘
│      ↓       │     └────────────┘     └─────────────┘
│ Virtual Disk │──────────────────────────────────────►
└──────────────┘              RPC (Caladan TCP)

Design Details

Sandook addresses multiple sources of SSD performance variability through a two-tier scheduling architecture:

Control-Plane Scheduling (at Controller): Runs at slower timescales (~100µs) with a global view. Implements read/write isolation by dynamically assigning SSDs to handle predominantly reads or writes, and uses SSD performance models to distribute load according to each disk's capacity.
Data-Plane Scheduling (at Client): Runs at faster timescales with local decisions. Selects which SSD replica to read from using weighted selection, and reacts to congestion by shifting load away from slow SSDs.
Log-Structured Writes: Writes can be directed to any SSD regardless of current block location, with block mappings maintained at the client. This enables maximum flexibility in write steering.
Congestion Control: Each disk server monitors its latency (p99) and signals congestion state to clients, which then reduce load to that server until it recovers.
SSD Performance Models: Each SSD is profiled offline to build load-latency curves for different workload mixes (read-only, write-only, 25/50/75% writes). These models guide the controller's scheduling decisions.

Directory Structure

sandook/
├── sandook/                    # Core source code
│   ├── base/                   # Common types, constants, I/O descriptors
│   ├── bindings/               # C++ bindings to Caladan runtime
│   ├── blk_dev/                # Linux block device agent (ublksrv integration)
│   ├── config/                 # Runtime configuration parsing
│   ├── controller/             # Central controller: registration, allocation, scheduling
│   ├── disk_model/             # SSD performance models (load → latency curves)
│   ├── disk_server/            # Storage server: POSIX, memory, and SPDK backends
│   ├── mem/                    # Memory management (slab allocator)
│   ├── rpc/                    # TCP-based RPC layer
│   ├── samples/                # Example applications
│   ├── scheduler/              # Scheduling algorithms
│   │   ├── control_plane/      # Controller-side: R/W isolation, profile-guided
│   │   └── data_plane/         # Client-side: weighted/random server selection
│   ├── telemetry/              # Performance monitoring and metrics
│   ├── test/                   # Unit tests and benchmarks
│   ├── utils/                  # Utility programs (calibration, profiling)
│   └── virtual_disk/           # Client-side virtual disk abstraction
├── lib/                        # Dependencies
│   ├── caladan/                # High-performance userspace networking runtime
│   ├── liburing/               # io_uring library
│   ├── ubdsrv/                 # Userspace block device server
│   ├── tdigest/                # T-Digest for streaming percentiles
│   └── patches/                # Patches for dependencies
├── loadgen/                    # Rust-based load generator for benchmarking
├── data/                       # SSD profiles and models
│   ├── ssd_models/             # Pre-computed latency-load models
│   └── ssd_profiles/           # Raw SSD profiling data
└── scripts/                    # Build, setup, and test scripts

Build

# Install dependencies
./scripts/install_deps.sh

# Build
./scripts/build.sh clean

# Setup (network_interface_name e.g., 100gp1)
./scripts/setup.sh <network_interface_name>

Build Requirements

GCC 13+ (C++23)
CMake 3.24+
Linux kernel 5.15+ (for ublk support)
DPDK-compatible NIC (for Caladan)

Testbed

The following testbed configuration was used for the experiments in the Sandook paper.

Storage

NVMe SSDs: Sandook requires NVMe SSDs accessible via SPDK
Tested Models: The paper evaluation used Samsung PM1725a and Western Digital (DC SN200)
Testbed Configuration: 10 SSDs distributed across multiple machines

Network

NIC: DPDK-compatible network interface card required for the Caladan runtime
Tested NIC: Mellanox ConnectX-6 (100 GbE)
Requirements: The NIC must support DPDK poll-mode drivers; Caladan uses kernel-bypass for low-latency networking

Other

CPU: Intel Xeon E5-2680 v4 CPU
DRAM: 64 GB DDR4

Software

Operating System: Ubuntu 23.04 with Linux kernel v6.5

Minimum Recommendation

Minimum 2 machines: one for the client (controller + block device agent) and one or more for disk servers
Each disk server machine requires at least one NVMe SSD
For proper benefits of the scheduling policies, have at least as many SSDs as the replication factor

Key Configuration

Configuration files (.config) specify:

Controller IP/port
Storage server IP/port
Scheduler type (control-plane and data-plane)
Virtual disk type (local/remote)
Disk server backend (POSIX/Memory/SPDK)

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
lib		lib
loadgen		loadgen
nsdi-26-ae		nsdi-26-ae
sandook		sandook
scripts		scripts
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sandook

Paper

Artifact Evaluation Guide (NSDI - Fall 2026)

Accessing Evaluation Testbed

Quick Start

Claims Evaluated

Reproducing Experiment Results

Overview

Key Features

Architecture

Design Details

Directory Structure

Build

Build Requirements

Testbed

Storage

Network

Other

Software

Minimum Recommendation

Key Configuration

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

mit-sandook/sandook

Folders and files

Latest commit

History

Repository files navigation

Sandook

Paper

Artifact Evaluation Guide (NSDI - Fall 2026)

Accessing Evaluation Testbed

Quick Start

Claims Evaluated

Reproducing Experiment Results

Overview

Key Features

Architecture

Design Details

Directory Structure

Build

Build Requirements

Testbed

Storage

Network

Other

Software

Minimum Recommendation

Key Configuration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages