Distributed ML Training Platform

This project is a distributed version of a neural network that splits training across multiple peers using FastAPI. The system leverages peer-to-peer communication to efficiently distribute training tasks and track performance metrics such as accuracy and loss.

Infrastructure Services

Docker Compose stack with MinIO, RabbitMQ, and MongoDB.

Quick Start

docker-compose up -d
docker-compose down -v

Services

Service	Port	Username	Password	URL
MinIO (S3)	9000, 9001	minioadmin	minioadmin	http://localhost:9001
RabbitMQ	5672, 15672	admin	admin	http://localhost:15672

Commands

# Start
docker-compose up -d
# Stop
docker-compose down
# Stop + delete data
# Logs
docker-compose logs -f

High-Level Architecture

Below is the high-level architecture diagram of the distributed neural network training system:

Architecture

graph TB
    subgraph Step1["Step 1: Authentication"]
        Client[User/Client<br/>Desktop FastAPI App]
        Auth[JWT Auth System<br/>+ MongoDB Users]
    end
    
    subgraph Step2["Step 2: Peer Selection"]
        Registry[Active Peer Registry<br/>Returns queue ID & device specs]
    end
    
    subgraph Step3["Step 3: Dataset Upload + Sharding"]
        Upload[Upload dataset<br/>.zip/.csv]
        Sharder[Data Sharder<br/>Splits into N shards]
        S3[Upload to S3 Blob]
    end
    
    subgraph Step4["Step 4: Prepare Training Job"]
        Config[config.json<br/>- RabbitMQ queues<br/>- shard file URLs<br/>- training params + config]
    end
    
    subgraph Step5["Step 5: Send Tasks to Peers"]
        RabbitMQ[RabbitMQ - 1 queue per peer]
        Peer1[Peer 1<br/>Local machine]
        Peer2[Peer 2<br/>Local machine]
        PeerN[Peer N<br/>Local machine]
        
        Core1[Core 1<br/>RabbitMQ<br/>Listener]
        Core2[Core 1<br/>RabbitMQ<br/>Listener]
        CoreN[Core 1<br/>RabbitMQ<br/>Listener]
        
        Train1[Core 2<br/>Forward +<br/>Backward<br/>Training]
        Train2[Core 2<br/>Forward +<br/>Backward<br/>Training]
        TrainN[Core 2<br/>Forward +<br/>Backward<br/>Training]
    end
    
    subgraph Step6["Step 6: Client Aggregates Results"]
        Dispatcher[Core 1<br/>Event/Queue<br/>Dispatcher]
        Aggregator[Core 2<br/>Aggregator Thread<br/>Merge Models]
    end
    
    subgraph Step7["Step 7: Final Model"]
        Final[Client saves model<br/>.pt/.h5]
    end
    
    Client --> Auth
    Auth --> Registry
    Registry --> Upload
    Upload --> Sharder
    Sharder --> S3
    S3 --> Config
    Config --> RabbitMQ
    
    RabbitMQ --> Peer1 & Peer2 & PeerN
    Peer1 --> Core1 --> Train1
    Peer2 --> Core2 --> Train2
    PeerN --> CoreN --> TrainN
    
    Train1 & Train2 & TrainN --> Dispatcher
    Dispatcher --> Aggregator
    Aggregator --> Final
    
    style Client fill:#64748b,stroke:#333,stroke-width:2px,color:#fff
    style Auth fill:#0c4b33,stroke:#333,stroke-width:2px,color:#fff
    style Registry fill:#2563eb,stroke:#333,stroke-width:2px,color:#fff
    style Upload fill:#8b5cf6,stroke:#333,stroke-width:2px,color:#fff
    style Sharder fill:#8b5cf6,stroke:#333,stroke-width:2px,color:#fff
    style S3 fill:#f59e0b,stroke:#333,stroke-width:2px,color:#fff
    style Config fill:#10b981,stroke:#333,stroke-width:2px,color:#fff
    style RabbitMQ fill:#ff6600,stroke:#333,stroke-width:2px,color:#fff
    style Peer1 fill:#3b82f6,stroke:#333,stroke-width:2px,color:#fff
    style Peer2 fill:#3b82f6,stroke:#333,stroke-width:2px,color:#fff
    style PeerN fill:#3b82f6,stroke:#333,stroke-width:2px,color:#fff
    style Core1 fill:#06b6d4,stroke:#333,stroke-width:2px,color:#fff
    style Core2 fill:#06b6d4,stroke:#333,stroke-width:2px,color:#fff
    style CoreN fill:#06b6d4,stroke:#333,stroke-width:2px,color:#fff
    style Train1 fill:#ec4899,stroke:#333,stroke-width:2px,color:#fff
    style Train2 fill:#ec4899,stroke:#333,stroke-width:2px,color:#fff
    style TrainN fill:#ec4899,stroke:#333,stroke-width:2px,color:#fff
    style Dispatcher fill:#0c4b33,stroke:#333,stroke-width:2px,color:#fff
    style Aggregator fill:#dc2626,stroke:#333,stroke-width:2px,color:#fff
    style Final fill:#64748b,stroke:#333,stroke-width:2px,color:#fff

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Coordinator		Coordinator
CoreX		CoreX
Peer		Peer
frontend		frontend
server		server
torrent		torrent
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Distributed ML Training Platform

Infrastructure Services

Quick Start

Services

Commands

High-Level Architecture

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Hamzenium/DistributeX

Folders and files

Latest commit

History

Repository files navigation

Distributed ML Training Platform

Infrastructure Services

Quick Start

Services

Commands

High-Level Architecture

Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages