Skip to content

Affinity in DDFacet not compatible with slurm system #36

@kgourdji

Description

@kgourdji

Hi,

We alluded to this issue briefly in issue #35 but I wanted to create a new issue for it as it wan't addressed and has become a showstopper for me in terms of being able to run DDFacet on the cluster I'm using (which operates using slurm). Essentially, DDFacet cannot tell if you are only using a subset of cores -- it counts all cores that exist on the node (and leads to the error message attached as a screenshot). This is a problem if you are unable to use all cores on a node (the case for me). In the example below, I requested 32 cores (the max I'm allowed to request) and there are 36 cores total on the node:

error1_with_affinity_on

I tried to circumvent this whole issue by simply turning affinity off using the option --Parallel-Affinity=disable. This however does not work for me either and, given the error message (see log file attached), it seems as though the code still thinks affinity is on. Any ideas on how I can get the software to run without running into these affinity problems? Or more simply, how I can definitively turn affinity off?

slurm-26674639.txt

Thanks!
Kelly

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions