Skip to content

Code implementations for splitting datasets for federated learning

License

Notifications You must be signed in to change notification settings

GwenLegate/DistributionsForFL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DistributionsForFL

This repository implements common ways of splitting a dataset to create non-i.i.d. federated learning distributions.

Splits

  • iid: distributed i.i.d. between clients with equal sample sizes
  • shard: based on the strategy outlined in Communication-Efficient Learning of Deep Networks from Decentralized Data (McMahan et. al.). Generalized to let the user specify the number of shard received by each client
  • Dirichlet equal: Uses a Dirichlet distribution parameterized by alpha [0.01, infinity) and partitions samples so that clients have an equal number of samples
  • Dirichlet unequal: Uses a Dirichlet distribution parameterized by alpha [0.01, infinity) and partitions samples so that clients have an unequal number of samples (Not implemented yet)

Sample Splits

Sample bar graphs for the proportions of CIFAR10 classes of 3 randomly selected clients, parameterized by alpha=0.1. L1, L2, ..., L10 along the y-axis are the class labels, the x-axis are the proportions of each class s.t. L1 + L2 + ... + L10 = 1.0

Client No. Proportion Visualization
10 [0. 0. 0.288 0. 0. 0. 0. 0. 0.712 0. ] alt text
4 [0. 0.0028 0.1088 0. 0. 0.5692 0.3172 0. 0.002 0. ] alt text
5 [0. 0.3848 0.0084 0.0056 0. 0.1204 0.3592 0.1192 0.0024 0. ] alt text

Use of Code

Install requirements and execute python main.py. You can set the type of split, number of users and alpha for the Dirichlet distribution in main.py

About

Code implementations for splitting datasets for federated learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages