Split Computing on IoT with Python Clients
Advanced split computing implementation for TinyML on IoT devices with intelligent offloading, variance detection, and resilient client-server communication.
The SCIoT project provides tools to use Edge Impulse models in ESP32 devices and Python clients, using split computing techniques. This repository includes advanced features for adaptive offloading and system resilience.
- Dynamic layer-by-layer offloading decisions
- Exponential Moving Average (EMA) time smoothing (α=0.2)
- Network-aware split point selection
- Support for 59-layer TFLite models (FOMO 96x96)
- Real-time inference time monitoring
- Coefficient of Variation (CV) analysis with 15% threshold
- Sliding window history (10 measurements per layer)
- Automatic cascade propagation (layer i → layer i+1)
- Triggers re-evaluation when performance changes
- Probabilistic forcing of device-local inference
- Refreshes device measurements periodically
- Configurable probability (0.0-1.0)
- Returns special value
-1for all-device execution - Seamless client-server coordination
- Graceful degradation to local-only mode
- Connection error handling with 5-second timeouts
- Automatic reconnection attempts
- Continues operation when server unavailable
- No crashes on network failures
- 44 automated tests (39 core + 5 MQTT)
- Interactive demonstration scripts
- Unit, integration, and system tests
- Connection resilience tests
- 100% test pass rate
If you use this work, please consider citing:
- F. Bove, S. Colli and L. Bedogni, "Performance Evaluation of Split Computing with TinyML on IoT Devices," 2024 IEEE 21st Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 2024, pp. 1-6, DOI Link.
- F. Bove and L. Bedogni, "Smart Split: Leveraging TinyML and Split Computing for Efficient Edge AI," 2024 IEEE/ACM Symposium on Edge Computing (SEC), Rome, Italy, 2024, pp. 456-460, DOI Link.
SCIoT_python_client/
├── src/
│ └── server/
│ ├── edge/ # Edge server initialization
│ ├── communication/ # HTTP, WebSocket, MQTT servers
│ │ ├── http_server.py # FastAPI HTTP server
│ │ ├── request_handler.py # Request processing + variance + local inference
│ │ └── websocket_server.py # WebSocket server
│ ├── models/ # Model management and inference
│ │ └── model_manager.py # Edge inference with variance tracking
│ ├── offloading_algo/ # Offloading decision algorithms
│ ├── device/ # Device simulation
│ ├── statistics/ # Performance statistics
│ ├── variance_detector.py # Variance detection system
│ ├── delay_simulator.py # Network/computation delay simulation
│ └── settings.yaml # Server configuration
│
├── server_client_light/
│ └── client/
│ ├── http_client.py # Python HTTP client (main)
│ ├── websocket_client.py # Python WebSocket client
│ ├── http_config.yaml # HTTP client configuration
│ ├── websocket_config.yaml # WebSocket client configuration
│ └── delay_simulator.py # Client-side delay simulation
│
├── tests/
│ ├── test_variance_and_local_inference.py # Core feature tests (27)
│ ├── test_client_resilience.py # Connection handling (12)
│ ├── test_mqtt_client/ # MQTT tests (5)
│ └── test_offloading_algo/ # Offloading algorithm tests
│
├── test_variance_detection.py # Interactive demo: variance detection
├── test_variance_cascading.py # Interactive demo: cascading
│
└── Documentation/
├── VARIANCE_DETECTION.md
├── VARIANCE_DETECTION_IMPLEMENTATION.md
├── LOCAL_INFERENCE_MODE.md
├── LOCAL_INFERENCE_IMPLEMENTATION.md
├── CLIENT_SERVER_-1_SEMANTICS.md
├── DELAY_SIMULATION.md
└── TEST_SUITE_SUMMARY.md
- Python 3.11+
- TensorFlow 2.15.0
- Docker (for MQTT broker)
Clone the repository:
git clone https://github.com/UBICO/SCIoT.git
cd SCIoT_python_clientCreate virtual environment and install dependencies:
uv syncActivate the virtual environment:
source .venv/bin/activate # On macOS/Linux
# or
.venv\Scripts\activate # On Windows- Save your Keras model as
test_model.h5insrc/server/models/test/test_model/ - Save your test image as
test_image.pnginsrc/server/models/test/test_model/pred_data/ - Split the model:
python3 src/server/models/model_split.py - Configure paths in
src/server/commons.py
communication:
http:
host: 0.0.0.0
port: 8000
endpoints:
registration: /api/registration
device_input: /api/device_input
offloading_layer: /api/offloading_layer
device_inference_result: /api/device_inference_result
delay_simulation:
computation:
enabled: false
type: gaussian
mean: 0.001
std_dev: 0.0002
network:
enabled: false
type: gaussian
mean: 0.020
std_dev: 0.005
local_inference_mode:
enabled: true
probability: 0.1 # 10% of requests force local inferenceclient:
device_id: "device_01"
http:
server_host: "0.0.0.0"
server_port: 8000
model:
last_offloading_layer: 58
local_inference_mode:
enabled: true
probability: 0.1Activate the virtual environment:
source .venv/bin/activateStart the MQTT broker (optional):
docker compose upRun the edge server:
python src/server/edge/run_edge.pyIn a separate terminal:
source .venv/bin/activate
python server_client_light/client/http_client.pyClient Behavior:
- Connects to server and registers device
- Sends image data
- Receives offloading decision (or
-1for local-only) - Runs inference (split or local)
- Sends results back to server
- Continues operating if server becomes unavailable (graceful degradation to local-only mode)
View real-time statistics:
streamlit run src/server/web/webpage.pypytest tests/test_variance_and_local_inference.py tests/test_client_resilience.py tests/test_mqtt_client/ -v# Core features (variance, local inference, -1 handling)
pytest tests/test_variance_and_local_inference.py -v
# Connection resilience
pytest tests/test_client_resilience.py -v
# MQTT client
pytest tests/test_mqtt_client/ -v# Variance detection demonstration
python test_variance_detection.py
# Cascade propagation demonstration
python test_variance_cascading.pyThe system monitors inference time stability using Coefficient of Variation (CV):
CV = StdDev / Mean
If CV > 15% → Unstable → Trigger re-test
Cascading: When layer i shows variance, layer i+1 is automatically flagged for re-testing (since layer i's output is layer i+1's input).
See VARIANCE_DETECTION.md for details.
Probabilistically forces device to run all layers locally:
- Purpose: Refresh device inference times periodically
- Configuration:
enabled(true/false) +probability(0.0-1.0) - Mechanism: Server returns
-1instead of calculated offloading layer - Client Handling:
-1→ converts to layer 58 (run all 59 layers locally)
See LOCAL_INFERENCE_MODE.md for details.
Simulate network and computation delays for testing:
delay_simulation:
computation:
enabled: true
type: gaussian # Options: static, gaussian, uniform, exponential
mean: 0.001 # 1ms average
std_dev: 0.0002 # 0.2ms variation
network:
enabled: true
type: gaussian
mean: 0.020 # 20ms average
std_dev: 0.005 # 5ms variationSee DELAY_SIMULATION.md for details.
Run comprehensive multi-scenario simulations with automated analysis:
# Run all 9 predefined scenarios (duration: ~15 minutes)
python simulation_runner.py
# Results saved to: simulated_results/simulation_YYYYMMDD_HHMMSS/
# - baseline_inference_results.csv
# - network_delay_20ms_inference_results.csv
# - computation_delay_5ms_inference_results.csv
# - ... (one per scenario)See SIMULATION_RUNNER_README.md for scenarios and configuration.
Generate comprehensive graphs and statistics from simulation results:
# Analyze a simulation folder
python analyze_simulation.py simulated_results/simulation_YYYYMMDD_HHMMSS
# Generates in analysis/ subfolder:
# - Device vs Edge time comparison plots
# - Total inference time bar charts
# - Throughput analysis
# - Timing distribution boxplots
# - Layer statistics
# - Comprehensive comparison dashboard
# - Summary statistics CSVSee ANALYSIS_README.md for detailed output descriptions and interpretation.
Clients handle server unavailability gracefully:
- Connection timeout: 5 seconds on all requests
- Fallback behavior: Run all layers locally when server unreachable
- No crashes: All network errors caught and handled
- Auto-retry: Attempts reconnection on each request
- Continues operation: System never stops, even when isolated
Example output when server is down:
⚠ Registration failed (server unreachable): Connection refused
→ Continuing with local-only inference
⚠ Cannot reach server: Connection refused
→ Running all layers locally
✓ Inference complete (layers 0-58)
Comprehensive documentation available:
- VARIANCE_DETECTION.md - Technical documentation of variance detection
- VARIANCE_DETECTION_IMPLEMENTATION.md - Implementation overview
- LOCAL_INFERENCE_MODE.md - Local inference mode reference
- LOCAL_INFERENCE_IMPLEMENTATION.md - Implementation details
- CLIENT_SERVER_-1_SEMANTICS.md - How -1 works end-to-end
- DELAY_SIMULATION.md - Delay simulation guide
- TEST_SUITE_SUMMARY.md - Complete test documentation
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Device │ ◄─────► │ Edge Server │ ◄─────► │ Analytics │
│ Client │ HTTP │ (FastAPI) │ │ Dashboard │
└─────────────┘ └──────────────┘ └─────────────┘
│ │
│ │
Inference Offloading
(0 to N) Algorithm +
Variance +
Local Mode
│ │
▼ ▼
Device Edge
Results Results
(times) (prediction)
Request Flow:
- Client sends image → Server
- Server returns offloading layer (or
-1) - Client runs inference up to layer
- Client sends results + times → Server
- Server tracks variance + updates times
- Server runs remaining layers (if needed)
- Server returns final prediction
- Inference: 59 layers (FOMO 96x96)
- Device time: ~19µs per layer average
- Edge time: ~450-540µs per layer average
- Network: Configurable latency simulation
- Variance threshold: 15% CV
- Refresh rate: Configurable (default 10% via local inference mode)
- Check port 8000 is not in use
- Verify TensorFlow is installed correctly
- Check model files exist in correct paths
- Verify server is running
- Check
server_hostandserver_portin config - Note: Client will continue in local-only mode if server unavailable
- Ensure virtual environment is activated
- Run
uv syncto update dependencies - Check Python version is 3.11+
- Verify model is split correctly
- Check layer dimensions match
- Review logs in
logs/directory
This is a research project. For questions or collaboration:
- Open an issue on GitHub
- Contact the UBICO research group
- See publications for research context
See LICENSE file for details.
Last Updated: December 31, 2025
Status: ✅ All systems operational (44/44 tests passing)
Version: Python 3.11.11, TensorFlow 2.15.0