Skip to content

CsrParAssembler::assemble_pattern significantly slower than CsrAssembler::assemble_pattern on a single thread #58

@Andlon

Description

@Andlon

Benchmarks showed ~30-70% overhead for the parallel variant with RAYON_NUM_THREADS=1. The discrepancy seems to be primarily related to rayon, since some preliminary investigation showed that replacing e.g. into_par_iter with into_iter accounts for most of the overhead. Further overhead could be removed by using atomic locks (though this requires more thought for efficiently handling the multi-threaded case).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions