Skip to content

Conversation

@adrienbernede
Copy link
Member

@adrienbernede adrienbernede commented Dec 11, 2025

Summary

The goal of this PR is to address discussions from #1726 and in particular, step 2 of spack/spack-packages#2276 (comment).

The changes are divided in three types:

For llvm-amdgpu, we remove the package specific logic to pass
appropriate rpaths to hip link line, and instead make sure that
any llvm-amdgpu external compiler recieves the appropriate rpaths.
This time for an rpath required for cce compiler
- spec: '%cray-mpich@8.1.29.rocm_6_3_1'
when: '%mpi'
cce_20:
- spec: fflags=-ef
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of having these in the toolchain vs the compiler spec?

Copy link
Member Author

@adrienbernede adrienbernede Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale is not entirely clear to me, but I trust @bmhan12 with it:
#1726 (reply in thread)

Now, personally, I see those configs as layers, from generic to specialized:
The compiler spec should be as generic as possible.
If we need to tune a compiler, the toolchain allows us do keep the compilers generic.
If a flag is only needed by a package, not the full stack, then it should be set in the package.

In this case, if Mfreeform and ef are Axom specific, I would set them in Axom package.
On the other hand, the extra-rpath appeared needed by any package built with those compilers, so we can define them in the compiler without loosing generality.

hip_link_flags = ""

rocm_root = os.path.dirname(spec["llvm-amdgpu"].prefix)
rocm_root = spec["llvm-amdgpu"].prefix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this required a removal of the llvm directory in the prefix of llvm-amdgpu...

Copy link
Member Author

@adrienbernede adrienbernede Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... that's an oversight. It still works because the CI is using a version of Spack where the CachedCMakePackage still defines ROCM_PATH as os.path.dirname(spec["llvm-amdgpu"].prefix).

Let me do a little experiment to demonstrate that:

  1. In the next commit, I’ll update axom reference to spack-packages so that it has the change in CachedCMakePackage (since spack/spack-packages@a229f54)
    That will cause the CI to fail on Tioga and Tuolumne.

  2. Then I’ll update Axom package to not set ROCM_ROOT_DIR, and update the llvm-amdgpu prefix to the correct path to fix the failure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... this won't work because Axom's CI is using the predefined host-configs.

Copy link
Member Author

@adrienbernede adrienbernede Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I’ll just push the fix in the spack configs along with the removal of ROCM_ROOT_DIR.
How should I test those changes? I'm thinking you might want to trigger host-config generation when there are changes in the local spack configs and packages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we generate and test our spack recipes manually then check in the generated host-config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants