This project is a distributed version of a neural network that splits training across multiple peers using FastAPI. The system leverages peer-to-peer communication to efficiently distribute training tasks and track performance metrics such as accuracy and loss.
Docker Compose stack with MinIO, RabbitMQ, and MongoDB.
docker-compose up -d
docker-compose down -v| Service | Port | Username | Password | URL |
|---|---|---|---|---|
| MinIO (S3) | 9000, 9001 | minioadmin | minioadmin | http://localhost:9001 |
| RabbitMQ | 5672, 15672 | admin | admin | http://localhost:15672 |
# Start
docker-compose up -d
# Stop
docker-compose down
# Stop + delete data
# Logs
docker-compose logs -fBelow is the high-level architecture diagram of the distributed neural network training system:
graph TB
subgraph Step1["Step 1: Authentication"]
Client[User/Client<br/>Desktop FastAPI App]
Auth[JWT Auth System<br/>+ MongoDB Users]
end
subgraph Step2["Step 2: Peer Selection"]
Registry[Active Peer Registry<br/>Returns queue ID & device specs]
end
subgraph Step3["Step 3: Dataset Upload + Sharding"]
Upload[Upload dataset<br/>.zip/.csv]
Sharder[Data Sharder<br/>Splits into N shards]
S3[Upload to S3 Blob]
end
subgraph Step4["Step 4: Prepare Training Job"]
Config[config.json<br/>- RabbitMQ queues<br/>- shard file URLs<br/>- training params + config]
end
subgraph Step5["Step 5: Send Tasks to Peers"]
RabbitMQ[RabbitMQ - 1 queue per peer]
Peer1[Peer 1<br/>Local machine]
Peer2[Peer 2<br/>Local machine]
PeerN[Peer N<br/>Local machine]
Core1[Core 1<br/>RabbitMQ<br/>Listener]
Core2[Core 1<br/>RabbitMQ<br/>Listener]
CoreN[Core 1<br/>RabbitMQ<br/>Listener]
Train1[Core 2<br/>Forward +<br/>Backward<br/>Training]
Train2[Core 2<br/>Forward +<br/>Backward<br/>Training]
TrainN[Core 2<br/>Forward +<br/>Backward<br/>Training]
end
subgraph Step6["Step 6: Client Aggregates Results"]
Dispatcher[Core 1<br/>Event/Queue<br/>Dispatcher]
Aggregator[Core 2<br/>Aggregator Thread<br/>Merge Models]
end
subgraph Step7["Step 7: Final Model"]
Final[Client saves model<br/>.pt/.h5]
end
Client --> Auth
Auth --> Registry
Registry --> Upload
Upload --> Sharder
Sharder --> S3
S3 --> Config
Config --> RabbitMQ
RabbitMQ --> Peer1 & Peer2 & PeerN
Peer1 --> Core1 --> Train1
Peer2 --> Core2 --> Train2
PeerN --> CoreN --> TrainN
Train1 & Train2 & TrainN --> Dispatcher
Dispatcher --> Aggregator
Aggregator --> Final
style Client fill:#64748b,stroke:#333,stroke-width:2px,color:#fff
style Auth fill:#0c4b33,stroke:#333,stroke-width:2px,color:#fff
style Registry fill:#2563eb,stroke:#333,stroke-width:2px,color:#fff
style Upload fill:#8b5cf6,stroke:#333,stroke-width:2px,color:#fff
style Sharder fill:#8b5cf6,stroke:#333,stroke-width:2px,color:#fff
style S3 fill:#f59e0b,stroke:#333,stroke-width:2px,color:#fff
style Config fill:#10b981,stroke:#333,stroke-width:2px,color:#fff
style RabbitMQ fill:#ff6600,stroke:#333,stroke-width:2px,color:#fff
style Peer1 fill:#3b82f6,stroke:#333,stroke-width:2px,color:#fff
style Peer2 fill:#3b82f6,stroke:#333,stroke-width:2px,color:#fff
style PeerN fill:#3b82f6,stroke:#333,stroke-width:2px,color:#fff
style Core1 fill:#06b6d4,stroke:#333,stroke-width:2px,color:#fff
style Core2 fill:#06b6d4,stroke:#333,stroke-width:2px,color:#fff
style CoreN fill:#06b6d4,stroke:#333,stroke-width:2px,color:#fff
style Train1 fill:#ec4899,stroke:#333,stroke-width:2px,color:#fff
style Train2 fill:#ec4899,stroke:#333,stroke-width:2px,color:#fff
style TrainN fill:#ec4899,stroke:#333,stroke-width:2px,color:#fff
style Dispatcher fill:#0c4b33,stroke:#333,stroke-width:2px,color:#fff
style Aggregator fill:#dc2626,stroke:#333,stroke-width:2px,color:#fff
style Final fill:#64748b,stroke:#333,stroke-width:2px,color:#fff