Skip to content

Conversation

@kloeffel
Copy link

Code used for performance benchmarks published in PSI-K B8.08
https://www.psik2025.net/program/schedule

nplist/nolist not even used, so remove them
allgather bandwith added
some more interfaces:
mpi allreduce with min
mpi reduce with mpi inplace
allgatherv inplace routines
for both MPI and MPI+OMP
rsync -a --include="*/" --include "*html" --include="*out" --exclude="*"
Regtest are performed with debug INTEL-XHOST-IFORT-MPI and environment
settings:
  export I_MPI_CBWR=2
  export MKL_CBWR=COMPATIBLE
  export MKL_DYNAMIC=false
  export OMP_DYNAMIC=false
  export KMP_DETERMINISTIC_REDUCTION=true
    for both MPI and MPI+OMP
    rsync -a --include="*/" --include "*html" --include="*out" --exclude="*"
    Regtest are performed with debug INTEL-XHOST-IFORT-MPI and environment
    settings:
      export I_MPI_CBWR=2
      export MKL_CBWR=COMPATIBLE
      export MKL_DYNAMIC=false
      export OMP_DYNAMIC=false
      export KMP_DETERMINISTIC_REDUCTION=true
distribute atoms taking into account the number of betaprojectors for
effective load balancing
get rid of problematic code for distributing stuff, use dist_entity2 for old behavior

# Conflicts:
#	src/SOURCES
#	src/distribution_utils.mod.F90
#	src/vdw_utils.mod.F90
        for both MPI and MPI+OMP
        rsync -a --include="*/" --include "*html" --include="*out" --exclude="*"
        Regtest are performed with debug INTEL-XHOST-IFORT-MPI and environment
        settings:
          export I_MPI_CBWR=2
          export MKL_CBWR=COMPATIBLE
          export MKL_DYNAMIC=false
          export OMP_DYNAMIC=false
          export KMP_DETERMINISTIC_REDUCTION=true
symmetrization can be disabled via optional flag symmetrization
communicator can be changed via optional flag gid
summat_parent is obsolete via optional flag parent
able to pack both spins in a single mpi call via optional flag lsd
without optional flags, returns to the original version com=allgrp
symmetrization = .true.
additional routine to pack and unpack a symmetric matrix
            for both MPI and MPI+OMP
            rsync -a --include="*/" --include "*html" --include="*out" --exclude="*"
            Regtest are performed with debug INTEL-XHOST-IFORT-MPI and environment
            settings:
              export I_MPI_CBWR=2
              export MKL_CBWR=COMPATIBLE
              export MKL_DYNAMIC=false
              export OMP_DYNAMIC=false
              export KMP_DETERMINISTIC_REDUCTION=true
                for both MPI and MPI+OMP
                rsync -a --include="*/" --include "*html" --include="*out" --exclude="*"
                Regtest are performed with debug INTEL-XHOST-IFORT-MPI and environment
                settings:
                  export I_MPI_CBWR=2
                  export MKL_CBWR=COMPATIBLE
                  export MKL_DYNAMIC=false
                  export OMP_DYNAMIC=false
                  export KMP_DETERMINISTIC_REDUCTION=true
                    for both MPI and MPI+OMP
                    rsync -a --include="*/" --include "*html" --include="*out" --exclude="*"
                    Regtest are performed with debug INTEL-XHOST-IFORT-MPI and environment
                    settings:
                      export I_MPI_CBWR=2
                      export MKL_CBWR=COMPATIBLE
                      export MKL_DYNAMIC=false
                      export OMP_DYNAMIC=false
                      export KMP_DETERMINISTIC_REDUCTION=true
                        for both MPI and MPI+OMP
                        rsync -a --include="*/" --include "*html" --include="*out" --exclude="*"
                        Regtest are performed with debug INTEL-XHOST-IFORT-MPI and environment
                        settings:
                          export I_MPI_CBWR=2
                          export MKL_CBWR=COMPATIBLE
                          export MKL_DYNAMIC=false
                          export OMP_DYNAMIC=false
                          export KMP_DETERMINISTIC_REDUCTION=true
                            for both MPI and MPI+OMP
                            rsync -a --include="*/" --include "*html" --include="*out" --exclude="*"
                            Regtest are performed with debug INTEL-XHOST-IFORT-MPI and environment
                            settings:
                              export I_MPI_CBWR=2
                              export MKL_CBWR=COMPATIBLE
                              export MKL_DYNAMIC=false
                              export OMP_DYNAMIC=false
                              export KMP_DETERMINISTIC_REDUCTION=true
                                for both MPI and MPI+OMP
                                rsync -a --include="*/" --include "*html" --include="*out" --exclude="*"
                                Regtest are performed with debug INTEL-XHOST-IFORT-MPI and environment
                                settings:
                                  export I_MPI_CBWR=2
                                  export MKL_CBWR=COMPATIBLE
                                  export MKL_DYNAMIC=false
                                  export OMP_DYNAMIC=false
                                  export KMP_DETERMINISTIC_REDUCTION=true
                                    for both MPI and MPI+OMP
                                    rsync -a --include="*/" --include "*html" --include="*out" --exclude="*"
                                    Regtest are performed with debug INTEL-XHOST-IFORT-MPI and environment
                                    settings:
                                      export I_MPI_CBWR=2
                                      export MKL_CBWR=COMPATIBLE
                                      export MKL_DYNAMIC=false
                                      export OMP_DYNAMIC=false
                                      export KMP_DETERMINISTIC_REDUCTION=true
                                        for both MPI and MPI+OMP
                                        rsync -a --include="*/" --include "*html" --include="*out" --exclude="*"
                                        Regtest are performed with debug INTEL-XHOST-IFORT-MPI and environment
                                        settings:
                                          export I_MPI_CBWR=2
                                          export MKL_CBWR=COMPATIBLE
                                          export MKL_DYNAMIC=false
                                          export OMP_DYNAMIC=false
                                          export KMP_DETERMINISTIC_REDUCTION=true
provide routines to build betaprojector arrays as needed in nlforce/rnlsm1/rnlsm2/spsi
rnlsm1/rnlsm2 helper routines to simplify both routines
additional arrays: fnlgam_packed / fnl_packed / dfnl_packed will be used later in uspp branch
Rewritten rnlsm1/2 routines, overlapping communication/computation possible with autotuning algorithm or user defined blocksizes
dfnla/fnla special pointers for gamma only case: ignore first and last dimension.
Adding reshape_inplace_r6_r4 and r5_r3 for dfnla/fnla.
    provide cp_grp_redist_array
    redistribution of arrays distributed along the second dimension is straight forward using allgatherv instead of allreduce
    cp_grp_redist_array_f
    redistribution of arrays distributed along the first deimension, current implementation uses a buffer and allgather - should test performance of multiple broadcasts with custom datatype to avoid the buffer, calling redist_array_r1 multiple times should also work but probably very inefficient
    cp_grp_get_sizes now also accecpts ncpw%nhg, use part_1d routine to avoid problems with cp_grp redistribution routines
    cp_grp_split_atoms generates custom na mapping to distribute atoms between cp_grps
    cp_grp_redist_dfnl_fnl redistributes fnl and/or dfnl arrays
                                            for both MPI and MPI+OM
                                            rsync -a --include="*/" --include "*html" --include="*out" --exclude="*"
                                            Regtest are performed with debug INTEL-XHOST-IFORT-MPI and environment
                                            settings:
                                              export I_MPI_CBWR=2
                                              export MKL_CBWR=COMPATIBLE
                                              export MKL_DYNAMIC=false
                                              export OMP_DYNAMIC=false
                                              export KMP_DETERMINISTIC_REDUCTION=true
g-mathias and others added 30 commits September 12, 2024 16:03
…into configure.sh; please use the -vdw option
extrapolation is required, since that will be handled by the
force-driver.
…put file" VERBOSE FORCE POSITIONS VELOCITIES" instead of enabling it at compile time
…ays better than with some modified gamma value, hence removing all gamma related code
…last segment, the following segment was lost
port neccessary data arrays to device environment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants