Skip to content

Would pangeo application using Infiniband based cluster speed up using RDMA optimised communication lib?  #43

@tinaok

Description

@tinaok

Basic installation of pangeo on infiniband cluster, use Tcp ip communication. Thus not benefitting from it's 'real' high speed /band width communication. Using RDMA connection between dask clients , running on an infiniband based cluster, should speed up it's communication..
There are benchmarks on infiniband cluster with GPU's using UCXPY or MPI4Dask. (https://blog.dask.org/2019/06/09/ucx-dgx, https://www.hpcadvisorycouncil.com/events/2020/australia-conference/pdf/HighPerfDeepMachineLearnonHPCSyst_010920_DKPanda.pdf, slide 46-47, http://hibd.cse.ohio-state.edu/features/#mpi4dask)
Our pangeo bench is based on CPU, and results we have in our repo uses infiniband based HPC clusters. Benchmarking of pangeo, for communication-bound (like rechunking, ..) may get speed up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions