From fe534afc0f6901f19c72c89f6c42be6d8838045a Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Wed, 5 Nov 2025 07:53:03 +0100 Subject: [PATCH 01/14] Squashed commit of the following: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 2a89832d1792cd17e93e7d6506e1ae0f7d4c188e Author: Ioannis Magkanaris Date: Mon Oct 27 09:56:16 2025 +0100 Fixed GPU_TX_MARKER test commit c2401285f4cbc4edab9c267fa1482be018ab61fb Merge: 10160bc1c e38d00617 Author: Ioannis Magkanaris Date: Fri Oct 24 18:29:55 2025 +0200 Merge remote-tracking branch 'upstream/main' into nvtx_ranges commit 10160bc1cf08b1ae8db4f8e3b4a232dc9f4fce1c Author: Ioannis Magkanaris Date: Fri Oct 24 18:27:58 2025 +0200 Fix instrumentation for copies commit d14093c7d8ae761f864738583747b3377a79a68d Author: Ioannis Magkanaris Date: Mon Oct 6 16:16:43 2025 +0200 Make pre-commit happy commit 68942a384e3798b9428f3f00dac2d63d9314f822 Merge: a3063e57e b415f6263 Author: Ioannis Magkanaris Date: Tue Sep 30 17:11:35 2025 +0200 Merge remote-tracking branch 'upstream/main' into nvtx_ranges commit a3063e57e0e10d0c73f68e2a6df92103cfa0d62d Author: Ioannis Magkanaris Date: Tue Sep 23 18:03:26 2025 +0200 Working version of nvtx markers with allocations commit 6788b9712f1fa88baf11838f80cb71ec7453d2c3 Author: Ioannis Magkanaris Date: Tue Sep 23 13:42:41 2025 +0200 Updated functions commit 455ad38ad3169f972dc7b8a0ab7871cae83b8720 Author: Ioannis Magkanaris Date: Tue Sep 23 13:20:51 2025 +0200 Added marker on allocations as well commit 80ce99cd8b2be2f9de2e096fec327be2d17dfadc Author: Ioannis Magkanaris Date: Wed Aug 20 19:02:36 2025 +0300 Avoid profiling tasklets commit 0314386fcc1e86c483c414d01c417c150e168703 Author: Ioannis Magkanaris Date: Wed Aug 20 19:02:28 2025 +0300 Fix get_latest_report_path in case there's no report commit aad5e877805afff6f581d57f361b4bd2b0be4ad2 Author: Ioannis Magkanaris Date: Wed Aug 20 10:05:16 2025 +0200 Remove import of deleted file commit a3ff00eeb1b4e9fe65272ba36fe10391b6bd9ec4 Author: Ioannis Magkanaris Date: Tue Aug 19 18:16:11 2025 +0200 Revert "Improved GPU Copy (#1976)" This reverts commit bc83c4750fe4b84beee8bd3bf7f1d42ae9f587cb. commit ea5f6ffa705a211833abbcaefb7502993293c4c7 Author: Ioannis Magkanaris Date: Tue Aug 19 18:14:35 2025 +0200 Make format happy commit b1ea9afc24d28e7772c7a0a93f2ed60e85fff538 Merge: bbc1fafa0 aabbe4821 Author: Ioannis Magkanaris Date: Tue Aug 19 19:12:43 2025 +0300 Merge branch 'main' into nvtx_ranges commit bbc1fafa0bbfe667f7dd61cfeeb6bbb356172e96 Author: Ioannis Magkanaris Date: Tue Aug 19 18:07:12 2025 +0200 Format a bit better with dace.instrument commit eea658f3e1e64bd2ee89fa44f704752ea162c73d Author: Ioannis Magkanaris Date: Tue Aug 19 18:04:54 2025 +0200 Fixes in gpu_tx_markers.py commit 2f43f7a90c801af70f8ad3c8aeb0092fe6696c4d Author: Ioannis Magkanaris Date: Tue Aug 19 18:04:43 2025 +0200 Remove instrument_sdfg commit 0fdb4df3a2b2cd7823fcfb27b706783204902d55 Author: Ioannis Magkanaris Date: Tue Aug 19 17:28:57 2025 +0200 Small refactoring of if statements in gpu_tx_markers.py commit 73c52bf5825ddc2b24cd30999a96d0f2d02851bb Author: Ioannis Magkanaris Date: Tue Aug 19 17:26:02 2025 +0200 Added on_sdfg_init/exit_begin/end functions commit ff70f2f029c60092d07bc20ba30c4c9582785686 Author: Ioannis Magkanaris Date: Tue Aug 19 16:32:34 2025 +0200 Replaced is with == commit 3d626e06866919a9d0b00d9f9c834216a104597f Author: Ioannis Magkanaris Date: Tue Aug 19 16:31:04 2025 +0200 Fix local and global streams commit 209860deced08dcc3f84285a16ccd2787be6a755 Author: Ioannis Magkanaris Date: Tue Aug 19 16:27:04 2025 +0200 Improve _is_sdfg_in_device_code commit bc83c4750fe4b84beee8bd3bf7f1d42ae9f587cb Author: Philip Müller <147368808+philip-paul-mueller@users.noreply.github.com> Date: Mon Jun 2 15:58:08 2025 +0200 Improved GPU Copy (#1976) Before some 2D copies (especially if they had FORTRAN order) were turned into Maps, see [issue#1953](https://github.com/spcl/dace/issues/1953). This PR modifies the code generator in such a way that such copies are now handled. There is some legacy stuff that should also be looked at. --------- Co-authored-by: Philip Mueller Co-authored-by: Tal Ben-Nun commit df9957125d4bbb97c6e8d453d7fd26db472ff6b4 Author: Ioannis Magkanaris Date: Tue Aug 19 17:34:12 2025 +0300 Apply suggestion from @tbennun Co-authored-by: Tal Ben-Nun commit da00f21b651bead1c061b3b54bff43e9142fd47c Author: Ioannis Magkanaris Date: Mon May 19 17:49:47 2025 +0200 Avoid pushing rocTX markers before initializing HIP since it doesn't work commit a39308bc7b9105ac02b139baf43eb29f78ff50bd Author: Ioannis Magkanaris Date: Fri May 16 15:13:31 2025 +0200 Fix on_copy and on_scope for GPU_TX_MARKERS commit 2d554fa26f2441ccdb9a52c073454018b6bc291a Author: Ioannis Magkanaris Date: Thu May 15 15:20:05 2025 +0200 Removed preprocessor checks by properly placing ranges in NestedSDFGs and small fixes for CPU wrapper includes commit 5937a153d212c3600cf8db38056163c440efeab6 Author: Ioannis Magkanaris Date: Wed May 14 11:33:02 2025 +0200 Refactored a bit GPUTXMarkerProvider commit 9e8ec9ea6b5013e0fad941995bef2923370733ff Author: Ioannis Magkanaris Date: Wed May 14 10:52:26 2025 +0200 Addressed PR comments for checking is the instrumentation is enabled commit c3f1932571f69d32a00d6a1ad7619b4ceb8e9e1a Author: Ioannis Magkanaris Date: Mon May 12 17:29:30 2025 +0200 Small fixes and cleanups commit 366721f28e4b5a01616fa6f251d46d54d61adefc Author: Ioannis Magkanaris Date: Mon May 12 17:23:21 2025 +0200 Fix order of imports in gpu_events.py commit 8ea432725a15704b6588860e39353dcf2f76b228 Author: Ioannis Magkanaris Date: Mon May 12 17:04:56 2025 +0200 Add markers for different SDFGs and states commit 22b372eed787aa621cdcd541eb59206a8952c6be Author: Ioannis Magkanaris Date: Mon May 12 09:45:20 2025 +0200 Revert changes in GPU_Event provider commit e5adaefc9773627ad0c142adbe98d1d64f926aa4 Author: Ioannis Magkanaris Date: Mon May 12 09:34:34 2025 +0200 Allow building with HIP even if rocTX is not found commit b30f4a268a696922889f8025f2d8fd61e9ac624e Author: Ioannis Magkanaris Date: Fri May 9 17:20:34 2025 +0200 Fix formatting commit 747f357a54cff0d5b4b0b0e35c26b7bd1ab04837 Author: Ioannis Magkanaris Date: Fri May 9 17:14:17 2025 +0200 Made test NVTX agnostic and updated documentation commit 646ca9053f5aaa30a97a0cdbb2545f1b47605036 Author: Ioannis Magkanaris Date: Fri May 9 17:05:10 2025 +0200 Use same checks for enabling roctx as CMake commit c28036ba99a5b6092ed126821b2bc39b4e785f0f Author: Ioannis Magkanaris Date: Fri May 9 17:00:19 2025 +0200 Fix compilation for AMD gpu commit 855304d329c6f660fb2bf61b5f9ba5d1b1ee5205 Author: Ioannis Magkanaris Date: Thu May 8 11:58:00 2025 +0200 Fix library names commit 9df4f73bf900c2775cbce381a2abe3606c319baa Author: Ioannis Magkanaris Date: Thu May 8 11:36:29 2025 +0200 Trying to use roctx commit a55aeb7ecdc98e5223bbad903fc557354deb54a8 Author: Ioannis Magkanaris Date: Wed May 7 17:58:37 2025 +0200 Make formatting happy commit a8bcadf175cfde856ec61f8af63947f623002df7 Author: Ioannis Magkanaris Date: Wed May 7 17:50:10 2025 +0200 Renamed NVTX to GPU_TX_MARKERS and added note for AMD GPUs commit 7337233db63b37df28aca68a91f1c8b0b53e7ce1 Author: Ioannis Magkanaris Date: Mon May 5 17:30:35 2025 +0200 Changed nvtxRangePushA to nvtxRangePush commit 74c9117b90500513ded30721ca19cb5b53429c0e Author: Ioannis Magkanaris Date: Mon May 5 17:23:42 2025 +0200 Fix copyright and GPU test commit 989bc3273af779d33d20b28a6b49a30f33b287b1 Author: Ioannis Magkanaris Date: Mon May 5 17:12:59 2025 +0200 Make formatter happy commit 4f572974e4cfc4f63f338c8ccb77dea161a128bb Author: Ioannis Magkanaris Date: Mon May 5 17:09:58 2025 +0200 Remove NVTX markers from LIKWID since LIKWID has its own markers commit a4d2ff8a47997250faa3790088c1ad8be5cdd311 Author: Ioannis Magkanaris Date: Mon May 5 17:08:08 2025 +0200 Improved NVTX markers in likwid commit 1e71171b4a3bc2a7e091fc7f8d8c01af52e363b7 Author: Ioannis Magkanaris Date: Mon May 5 15:42:13 2025 +0200 Update NVTX Provider imports commit 438090f847d8016cbb64edca30cffb7239997164 Author: Ioannis Magkanaris Date: Mon May 5 15:41:56 2025 +0200 Update documentation commit 89b7864eab3046ff1e7a6e7cdb4816c2731ec8dd Author: Ioannis Magkanaris Date: Mon May 5 15:41:48 2025 +0200 Small fix of whiteline in framecode commit ef5355b4a2627aa4267c8b4bc82007f2e0f1f793 Author: Ioannis Magkanaris Date: Mon May 5 15:38:02 2025 +0200 Refactored NVTX Instrumentation provider constructor and test for expected code commit bbf1d3218a4ca80c10c648f001dd6d7ca88730df Author: Ioannis Magkanaris Date: Mon May 5 15:37:16 2025 +0200 Inherit LIKWID_GPU Instrumentation provider from NVTX as well commit 90b50ac61d7e686e0389ad17587fa274a1730a10 Author: Ioannis Magkanaris Date: Fri May 2 18:29:07 2025 +0200 Make GPUEventProvider inherit from NVTXProvider to enable the NVTX markers by default with it commit c584255a3c24b3f29debcbc122990e173a780ef8 Author: Ioannis Magkanaris Date: Fri May 2 18:01:31 2025 +0200 Updated documentation commit 04836fb005f4368c77731bc01ed3b846d5e97c26 Author: Ioannis Magkanaris Date: Fri May 2 18:01:21 2025 +0200 Moved the printing of NVTX range push and pop inside the NVTXProvider commit f5240b2274f3fbc21ea85e3686aaa5d3e0e5e07f Author: Ioannis Magkanaris Date: Fri May 2 17:25:04 2025 +0200 Added NVTX range in CPU wrapper for GPU kernel --- dace/builtin_hooks.py | 3 +- dace/codegen/CMakeLists.txt | 17 +- dace/codegen/instrumentation/__init__.py | 1 + .../codegen/instrumentation/gpu_tx_markers.py | 255 ++++++++++++++++++ dace/codegen/instrumentation/provider.py | 72 +++++ dace/codegen/targets/cuda.py | 9 + dace/codegen/targets/framecode.py | 58 +++- dace/dtypes.py | 1 + dace/sdfg/sdfg.py | 8 +- doc/optimization/profiling.rst | 3 +- doc/source/dace.codegen.instrumentation.rst | 8 + tests/instrumentation_test.py | 36 ++- 12 files changed, 455 insertions(+), 16 deletions(-) create mode 100644 dace/codegen/instrumentation/gpu_tx_markers.py diff --git a/dace/builtin_hooks.py b/dace/builtin_hooks.py index 2a5b49e983..b691cd0296 100644 --- a/dace/builtin_hooks.py +++ b/dace/builtin_hooks.py @@ -96,7 +96,8 @@ def _make_filter_function(filter: Optional[Union[str, Callable[[Any], bool]]], if isinstance(filter, str): # If a string was given, construct predicate based on wildcard name matching if with_attr: - filter_func = lambda elem: fnmatch.fnmatch(elem.name, filter) + filter_func = lambda elem: fnmatch.fnmatch(elem.name, filter) if hasattr(elem, 'name') else fnmatch.fnmatch( + elem.label, filter) else: filter_func = lambda elem: fnmatch.fnmatch(elem, filter) elif callable(filter): diff --git a/dace/codegen/CMakeLists.txt b/dace/codegen/CMakeLists.txt index e1a5e33947..4d2f2ef506 100644 --- a/dace/codegen/CMakeLists.txt +++ b/dace/codegen/CMakeLists.txt @@ -141,7 +141,7 @@ if(DACE_ENABLE_CUDA) set(CMAKE_CUDA_ARCHITECTURES "${LOCAL_CUDA_ARCHITECTURES}") enable_language(CUDA) - list(APPEND DACE_LIBS CUDA::cudart) + list(APPEND DACE_LIBS CUDA::cudart CUDA::nvtx3) add_definitions(-DWITH_CUDA) if (MSVC_IDE) @@ -167,6 +167,21 @@ if(DACE_ENABLE_HIP) # Add libraries such as rocBLAS link_directories(${HIP_PATH}/../lib) + if(ROCM_PATH) + find_path(ROCTX_INCLUDE_DIR roctx.h HINTS ${ROCM_PATH}/include/roctracer ${ROCM_PATH}/roctracer/include) + if(NOT ROCTX_INCLUDE_DIR) + message(WARNING "Could not find roctx.h in ${ROCM_PATH}/include/roctracer or ${ROCM_PATH}/roctracer/include") + endif() + endif() + if(ROCM_PATH AND ROCTX_INCLUDE_DIR) + find_path(ROCTX_LIBRARY_DIR "libroctx64.so" HINTS ${ROCM_PATH}/lib) + if(NOT ROCTX_LIBRARY_DIR) + message(WARNING "Could not find libroctx64.so in ${ROCM_PATH}/lib") + else() + list(APPEND DACE_LIBS "-lroctx64 -L${ROCTX_LIBRARY_DIR}") + include_directories(SYSTEM ${ROCTX_INCLUDE_DIR}) + endif() + endif() endif() # Function for performing deferred variable expansion diff --git a/dace/codegen/instrumentation/__init__.py b/dace/codegen/instrumentation/__init__.py index d357e1a5a3..5ebab3f497 100644 --- a/dace/codegen/instrumentation/__init__.py +++ b/dace/codegen/instrumentation/__init__.py @@ -7,5 +7,6 @@ from .timer import TimerProvider from .gpu_events import GPUEventProvider from .fpga import FPGAInstrumentationProvider +from .gpu_tx_markers import GPUTXMarkersProvider from .data.data_dump import SaveProvider, RestoreProvider diff --git a/dace/codegen/instrumentation/gpu_tx_markers.py b/dace/codegen/instrumentation/gpu_tx_markers.py new file mode 100644 index 0000000000..be94e425fe --- /dev/null +++ b/dace/codegen/instrumentation/gpu_tx_markers.py @@ -0,0 +1,255 @@ +# Copyright 2019-2025 ETH Zurich and the DaCe authors. All rights reserved. +import os +from typing import Union + +from dace import dtypes, registry +from dace.codegen import common +from dace.codegen.prettycode import CodeIOStream +from dace.codegen.instrumentation.provider import InstrumentationProvider +from dace.memlet import Memlet +from dace.sdfg import nodes, SDFG +from dace.sdfg.graph import MultiConnectorEdge +from dace.sdfg.nodes import NestedSDFG +from dace.sdfg.scope import is_devicelevel_gpu_kernel +from dace.sdfg.sdfg import SDFG +from dace.sdfg.state import ControlFlowRegion, SDFGState + + +@registry.autoregister_params(type=dtypes.InstrumentationType.GPU_TX_MARKERS) +class GPUTXMarkersProvider(InstrumentationProvider): + """ Timing instrumentation that adds NVTX/rocTX ranges to SDFGs and states. """ + NVTX_HEADER_INCLUDE = '#include ' + ROCTX_HEADER_INCLUDE = '#include ' + + def __init__(self): + self.backend = common.get_gpu_backend() + # Check if ROCm TX libraries and headers are available + rocm_path = os.getenv('ROCM_PATH', '/opt/rocm') + roctx_header_paths = [ + os.path.join(rocm_path, 'roctracer/include/roctx.h'), + os.path.join(rocm_path, 'include/roctracer/roctx.h') + ] + roctx_library_path = os.path.join(rocm_path, 'lib', 'libroctx64.so') + self.enable_rocTX = any(os.path.isfile(path) + for path in roctx_header_paths) and os.path.isfile(roctx_library_path) + self.include_generated = False + super().__init__() + + def _print_include(self, sdfg: SDFG) -> None: + """ Prints the include statement for the NVTX/rocTX library for a given SDFG. """ + if self.include_generated: + return + if self.backend == 'cuda': + sdfg.append_global_code(self.NVTX_HEADER_INCLUDE, 'frame') + elif self.backend == 'hip': + if self.enable_rocTX: + sdfg.append_global_code(self.ROCTX_HEADER_INCLUDE, 'frame') + else: + raise NameError('GPU backend "%s" not recognized' % self.backend) + self.include_generated = True + + def print_include(self, stream: CodeIOStream) -> None: + """ Prints the include statement for the NVTX/rocTX library in stream. """ + if stream is None: + return + if self.backend == 'cuda': + stream.write(self.NVTX_HEADER_INCLUDE) + elif self.backend == 'hip': + if self.enable_rocTX: + stream.write(self.ROCTX_HEADER_INCLUDE) + else: + raise NameError('GPU backend "%s" not recognized' % self.backend) + + def print_range_push(self, name: str, sdfg: SDFG, stream: CodeIOStream) -> None: + if stream is None: + return + self._print_include(sdfg) + if name is None: + name = 'None' + if self.backend == 'cuda': + stream.write(f'nvtxRangePush("{name}");') + elif self.backend == 'hip': + if self.enable_rocTX: + stream.write(f'roctxRangePush("{name}");') + else: + raise NameError(f'GPU backend "{self.backend}" not recognized') + + def print_range_pop(self, stream: CodeIOStream) -> None: + if stream is None: + return + if self.backend == 'cuda': + stream.write('nvtxRangePop();') + elif self.backend == 'hip': + if self.enable_rocTX: + stream.write('roctxRangePop();') + else: + raise NameError(f'GPU backend "{self.backend}" not recognized') + + def _is_sdfg_in_device_code(self, sdfg: SDFG) -> bool: + """ Check if the SDFG is in device code and not top level SDFG. """ + sdfg_parent_state = sdfg.parent + while sdfg_parent_state is not None: + sdfg_parent_node = sdfg.parent_nsdfg_node + if is_devicelevel_gpu_kernel(sdfg, sdfg_parent_state, sdfg_parent_node): + return True + sdfg_parent_state = sdfg_parent_state.sdfg.parent + return False + + def on_sdfg_begin(self, sdfg: SDFG, local_stream: CodeIOStream, global_stream: CodeIOStream, codegen) -> None: + if sdfg.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + if self._is_sdfg_in_device_code(sdfg): + # Don't instrument device code + return + self.print_range_push(f'sdfg_{sdfg.name}', sdfg, local_stream) + + def on_sdfg_end(self, sdfg: SDFG, local_stream: CodeIOStream, global_stream: CodeIOStream) -> None: + if sdfg.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + if self._is_sdfg_in_device_code(sdfg): + # Don't instrument device code + return + self.print_range_pop(local_stream) + + def on_state_begin(self, sdfg: SDFG, cfg: ControlFlowRegion, state: SDFGState, local_stream: CodeIOStream, + global_stream: CodeIOStream) -> None: + if state.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + if self._is_sdfg_in_device_code(sdfg): + # Don't instrument device code + return + self.print_range_push(f'state_{state.label}', sdfg, local_stream) + + def on_state_end(self, sdfg: SDFG, cfg: ControlFlowRegion, state: SDFGState, local_stream: CodeIOStream, + global_stream: CodeIOStream) -> None: + if state.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + if self._is_sdfg_in_device_code(sdfg): + # Don't instrument device code + return + self.print_range_pop(local_stream) + + def on_copy_begin(self, sdfg: SDFG, cfg: ControlFlowRegion, state: SDFGState, src_node: nodes.Node, + dst_node: nodes.Node, edge: MultiConnectorEdge[Memlet], local_stream: CodeIOStream, + global_stream: CodeIOStream, copy_shape, src_strides, dst_strides) -> None: + if state.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + if is_devicelevel_gpu_kernel(sdfg, state, src_node) or is_devicelevel_gpu_kernel(sdfg, state, dst_node): + # Don't instrument device code + return + self.print_range_push(f'copy_{src_node.label}_to_{dst_node.label}', sdfg, local_stream) + + def on_copy_end(self, sdfg: SDFG, cfg: ControlFlowRegion, state: SDFGState, src_node: nodes.Node, + dst_node: nodes.Node, edge: MultiConnectorEdge[Memlet], local_stream: CodeIOStream, + global_stream: CodeIOStream) -> None: + if state.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + if is_devicelevel_gpu_kernel(sdfg, state, src_node) or is_devicelevel_gpu_kernel(sdfg, state, dst_node): + # Don't instrument device code + return + self.print_range_pop(local_stream) + + def on_scope_entry(self, sdfg: SDFG, cfg: ControlFlowRegion, state: SDFGState, node: nodes.EntryNode, + outer_stream: CodeIOStream, inner_stream: CodeIOStream, global_stream: CodeIOStream) -> None: + if node.map.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + if is_devicelevel_gpu_kernel(sdfg, state, node): + # Don't instrument device code + return + self.print_range_push(f'scope_{node.label}', sdfg, outer_stream) + + def on_scope_exit(self, sdfg: SDFG, cfg: ControlFlowRegion, state: SDFGState, node: nodes.ExitNode, + outer_stream: CodeIOStream, inner_stream: CodeIOStream, global_stream: CodeIOStream) -> None: + entry_node = state.entry_node(node) + if entry_node.map.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + if is_devicelevel_gpu_kernel(sdfg, state, entry_node): + # Don't instrument device code + return + self.print_range_pop(outer_stream) + + def on_sdfg_init_begin(self, sdfg: SDFG, callsite_stream: CodeIOStream, global_stream: CodeIOStream) -> None: + if sdfg.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + if self._is_sdfg_in_device_code(sdfg): + # Don't instrument device code + return + # cannot push rocTX markers before initializing HIP + if self.enable_rocTX: + return + self.print_range_push(f'init_{sdfg.name}', sdfg, callsite_stream) + + def on_sdfg_init_end(self, sdfg: SDFG, callsite_stream: CodeIOStream, global_stream: CodeIOStream) -> None: + if sdfg.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + if self._is_sdfg_in_device_code(sdfg): + # Don't instrument device code + return + # cannot push rocTX markers before initializing HIP so there's no marker to pop + if self.enable_rocTX: + return + self.print_range_pop(callsite_stream) + + def on_sdfg_exit_begin(self, sdfg: SDFG, callsite_stream: CodeIOStream, global_stream: CodeIOStream) -> None: + if sdfg.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + if self._is_sdfg_in_device_code(sdfg): + # Don't instrument device code + return + self.print_range_push(f'exit_{sdfg.name}', sdfg, callsite_stream) + + def on_sdfg_exit_end(self, sdfg: SDFG, callsite_stream: CodeIOStream, global_stream: CodeIOStream) -> None: + if sdfg.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + if self._is_sdfg_in_device_code(sdfg): + # Don't instrument device code + return + self.print_range_pop(callsite_stream) + + def on_allocation_begin(self, sdfg: SDFG, scope: Union[nodes.EntryNode, SDFGState, SDFG], + stream: CodeIOStream) -> None: + if sdfg.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + # We only want to instrument allocations at the SDFG or state level + if not isinstance(scope, (SDFGState, SDFG)): + return + if self._is_sdfg_in_device_code(sdfg): + # Don't instrument device code + return + self.print_range_push(f'alloc_{sdfg.name}', sdfg, stream) + + def on_allocation_end(self, sdfg: SDFG, scope: Union[nodes.EntryNode, SDFGState, SDFG], + stream: CodeIOStream) -> None: + if sdfg.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + # We only want to instrument allocations at the SDFG or state level + if not isinstance(scope, (SDFGState, SDFG)): + return + if self._is_sdfg_in_device_code(sdfg): + # Don't instrument device code + return + self.print_range_pop(stream) + + def on_deallocation_begin(self, sdfg: SDFG, scope: Union[nodes.EntryNode, SDFGState, SDFG], + stream: CodeIOStream) -> None: + if sdfg.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + # We only want to instrument allocations at the SDFG or state level + if not isinstance(scope, (SDFGState, SDFG)): + return + if self._is_sdfg_in_device_code(sdfg): + # Don't instrument device code + return + self.print_range_push(f'dealloc_{sdfg.name}', sdfg, stream) + + def on_deallocation_end(self, sdfg: SDFG, scope: Union[nodes.EntryNode, SDFGState, SDFG], + stream: CodeIOStream) -> None: + if sdfg.instrument != dtypes.InstrumentationType.GPU_TX_MARKERS: + return + # We only want to instrument allocations at the SDFG or state level + if not isinstance(scope, (SDFGState, SDFG)): + return + if self._is_sdfg_in_device_code(sdfg): + # Don't instrument device code + return + self.print_range_pop(stream) diff --git a/dace/codegen/instrumentation/provider.py b/dace/codegen/instrumentation/provider.py index a95c0495ba..dc643df4ca 100644 --- a/dace/codegen/instrumentation/provider.py +++ b/dace/codegen/instrumentation/provider.py @@ -183,3 +183,75 @@ def on_node_end(self, sdfg: SDFG, cfg: ControlFlowRegion, state: SDFGState, node :param global_stream: Code generator for global (external) code. """ pass + + def on_sdfg_init_begin(self, sdfg: SDFG, local_stream: CodeIOStream, global_stream: CodeIOStream) -> None: + """ Event called at the beginning of SDFG initialization code generation. + + :param sdfg: The generated SDFG object. + :param local_stream: Code generator for the in-function code. + :param global_stream: Code generator for global (external) code. + """ + pass + + def on_sdfg_init_end(self, sdfg: SDFG, local_stream: CodeIOStream, global_stream: CodeIOStream) -> None: + """ Event called at the end of SDFG initialization code generation. + + :param sdfg: The generated SDFG object. + :param local_stream: Code generator for the in-function code. + :param global_stream: Code generator for global (external) code. + """ + pass + + def on_sdfg_exit_begin(self, sdfg: SDFG, local_stream: CodeIOStream, global_stream: CodeIOStream) -> None: + """ Event called at the beginning of SDFG exit code generation. + + :param sdfg: The generated SDFG object. + :param local_stream: Code generator for the in-function code. + :param global_stream: Code generator for global (external) code. + """ + pass + + def on_sdfg_exit_end(self, sdfg: SDFG, local_stream: CodeIOStream, global_stream: CodeIOStream) -> None: + """ Event called at the end of SDFG exit code generation. + + :param sdfg: The generated SDFG object. + :param local_stream: Code generator for the in-function code. + :param global_stream: Code generator for global (external) code. + """ + pass + + def on_allocation_begin(self, sdfg: SDFG, scope: Union[nodes.EntryNode, SDFGState, SDFG], + stream: CodeIOStream) -> None: + """ Event called at the beginning of an allocation code generation. + + :param sdfg: The generated SDFG object. + :param stream: Code generator. + """ + pass + + def on_allocation_end(self, sdfg: SDFG, scope: Union[nodes.EntryNode, SDFGState, SDFG], + lstream: CodeIOStream) -> None: + """ Event called at the end of an allocation code generation. + + :param sdfg: The generated SDFG object. + :param local_stream: Code generator. + """ + pass + + def on_deallocation_begin(self, sdfg: SDFG, scope: Union[nodes.EntryNode, SDFGState, SDFG], + stream: CodeIOStream) -> None: + """ Event called at the beginning of a deallocation code generation. + + :param sdfg: The generated SDFG object. + :param local_stream: Code generator. + """ + pass + + def on_deallocation_end(self, sdfg: SDFG, scope: Union[nodes.EntryNode, SDFGState, SDFG], + lstream: CodeIOStream) -> None: + """ Event called at the end of a deallocation code generation. + + :param sdfg: The generated SDFG object. + :param local_stream: Code generator. + """ + pass diff --git a/dace/codegen/targets/cuda.py b/dace/codegen/targets/cuda.py index fe277538d3..127cf3fa4d 100644 --- a/dace/codegen/targets/cuda.py +++ b/dace/codegen/targets/cuda.py @@ -1088,6 +1088,11 @@ def _emit_copy(self, state_id: int, src_node: nodes.Node, src_storage: dtypes.St is_c_order = is_fortran_order dims = 1 + for instr in self._dispatcher.instrumentation.values(): + if instr is not None: + instr.on_copy_begin(sdfg, cfg, state_dfg, src_node, dst_node, edge, callsite_stream, None, + copy_shape, src_strides, dst_strides) + if dims > 2: # Currently we only support ND copies when they can be represented # as a 1D copy or as a 2D strided copy @@ -1243,6 +1248,10 @@ def _emit_copy(self, state_id: int, src_node: nodes.Node, src_storage: dtypes.St self._emit_sync(callsite_stream) + for instr in self._dispatcher.instrumentation.values(): + if instr is not None: + instr.on_copy_end(sdfg, cfg, state_dfg, src_node, dst_node, edge, callsite_stream, None) + # Copy within the GPU elif (src_storage in gpu_storage_types and dst_storage in gpu_storage_types): diff --git a/dace/codegen/targets/framecode.py b/dace/codegen/targets/framecode.py index 449e312efa..aecd78a092 100644 --- a/dace/codegen/targets/framecode.py +++ b/dace/codegen/targets/framecode.py @@ -14,6 +14,7 @@ from dace.codegen import dispatcher as disp from dace.codegen.prettycode import CodeIOStream from dace.codegen.common import codeblock_to_cpp, sym2cpp +from dace.codegen.instrumentation.gpu_tx_markers import GPUTXMarkersProvider from dace.codegen.targets.target import TargetCodeGenerator from dace.codegen.tools.type_inference import infer_expr_type from dace.sdfg import SDFG, SDFGState, nodes @@ -254,6 +255,13 @@ def generate_footer(self, sdfg: SDFG, global_stream: CodeIOStream, callsite_stre # Write closing brace of program callsite_stream.write('}', sdfg) + if sdfg.instrument == dtypes.InstrumentationType.GPU_TX_MARKERS: + # Need to make sure that the necessary includes for GPU_TX_MARKERS are present + # in the generated code. + gpu_tx_markers_provider = self._dispatcher.instrumentation.get(dtypes.InstrumentationType.GPU_TX_MARKERS) + if gpu_tx_markers_provider: + gpu_tx_markers_provider.print_include(callsite_stream) + # Write awkward footer to avoid 'extern "C"' issues params_comma = (', ' + params) if params else '' initparams_comma = (', ' + initparams) if initparams else '' @@ -279,11 +287,17 @@ def generate_footer(self, sdfg: SDFG, global_stream: CodeIOStream, callsite_stre callsite_stream.write( f""" DACE_EXPORTED {mangle_dace_state_struct_name(sdfg)} *__dace_init_{sdfg.name}({initparams}) -{{ - int __result = 0; - {mangle_dace_state_struct_name(sdfg)} *__state = new {mangle_dace_state_struct_name(sdfg)}; +{{""", sdfg) - """, sdfg) + # Invoke all instrumentation providers + for instr in self._dispatcher.instrumentation.values(): + if instr is not None: + instr.on_sdfg_init_begin(sdfg, callsite_stream, global_stream) + + callsite_stream.write( + f""" + int __result = 0; + {mangle_dace_state_struct_name(sdfg)} *__state = new {mangle_dace_state_struct_name(sdfg)};""", sdfg) for target in self._dispatcher.used_targets: if target.has_initializer: @@ -304,17 +318,29 @@ def generate_footer(self, sdfg: SDFG, global_stream: CodeIOStream, callsite_stre callsite_stream.write(self._initcode.getvalue(), sdfg) - callsite_stream.write( - f""" + callsite_stream.write(f""" if (__result) {{ delete __state; return nullptr; }} +""", sdfg) + # Invoke all instrumentation providers + for instr in self._dispatcher.instrumentation.values(): + if instr is not None: + instr.on_sdfg_init_end(sdfg, callsite_stream, global_stream) + callsite_stream.write( + f""" return __state; }} DACE_EXPORTED int __dace_exit_{sdfg.name}({mangle_dace_state_struct_name(sdfg)} *__state) {{ +""", sdfg) + # Invoke all instrumentation providers + for instr in self._dispatcher.instrumentation.values(): + if instr is not None: + instr.on_sdfg_exit_begin(sdfg, callsite_stream, global_stream) + callsite_stream.write(f""" int __err = 0; """, sdfg) @@ -349,6 +375,10 @@ def generate_footer(self, sdfg: SDFG, global_stream: CodeIOStream, callsite_stre callsite_stream.write("}") callsite_stream.write('delete __state;\n', sdfg) + # Invoke all instrumentation providers + for instr in self._dispatcher.instrumentation.values(): + if instr is not None: + instr.on_sdfg_exit_end(sdfg, callsite_stream, global_stream) callsite_stream.write('return __err;\n}\n', sdfg) def generate_external_memory_management(self, sdfg: SDFG, callsite_stream: CodeIOStream): @@ -798,6 +828,11 @@ def determine_allocation_lifetime(self, top_sdfg: SDFG): def allocate_arrays_in_scope(self, sdfg: SDFG, cfg: ControlFlowRegion, scope: Union[nodes.EntryNode, SDFGState, SDFG], function_stream: CodeIOStream, callsite_stream: CodeIOStream) -> None: + if len(self.to_allocate[scope]) == 0: + return + for instr in self._dispatcher.instrumentation.values(): + if instr is not None: + instr.on_allocation_begin(sdfg, scope, callsite_stream) """ Dispatches allocation of all arrays in the given scope. """ for tsdfg, state, node, declare, allocate, _ in self.to_allocate[scope]: if state is not None: @@ -809,10 +844,18 @@ def allocate_arrays_in_scope(self, sdfg: SDFG, cfg: ControlFlowRegion, scope: Un self._dispatcher.dispatch_allocate(tsdfg, cfg if state is None else state.parent_graph, state, state_id, node, desc, function_stream, callsite_stream, declare, allocate) + for instr in self._dispatcher.instrumentation.values(): + if instr is not None: + instr.on_allocation_end(sdfg, scope, callsite_stream) def deallocate_arrays_in_scope(self, sdfg: SDFG, cfg: ControlFlowRegion, scope: Union[nodes.EntryNode, SDFGState, SDFG], function_stream: CodeIOStream, callsite_stream: CodeIOStream): + if len(self.to_allocate[scope]) == 0: + return + for instr in self._dispatcher.instrumentation.values(): + if instr is not None: + instr.on_deallocation_begin(sdfg, scope, callsite_stream) """ Dispatches deallocation of all arrays in the given scope. """ for tsdfg, state, node, _, _, deallocate in self.to_allocate[scope]: if not deallocate: @@ -826,6 +869,9 @@ def deallocate_arrays_in_scope(self, sdfg: SDFG, cfg: ControlFlowRegion, scope: self._dispatcher.dispatch_deallocate(tsdfg, state.parent_graph, state, state_id, node, desc, function_stream, callsite_stream) + for instr in self._dispatcher.instrumentation.values(): + if instr is not None: + instr.on_deallocation_end(sdfg, scope, callsite_stream) def generate_code(self, sdfg: SDFG, diff --git a/dace/dtypes.py b/dace/dtypes.py index faadc84a50..28372087e9 100644 --- a/dace/dtypes.py +++ b/dace/dtypes.py @@ -167,6 +167,7 @@ class InstrumentationType(aenum.AutoNumberEnum): LIKWID_GPU = () GPU_Events = () FPGA = () + GPU_TX_MARKERS = () @undefined_safe_enum diff --git a/dace/sdfg/sdfg.py b/dace/sdfg/sdfg.py index 44a085603d..def7d5dce6 100644 --- a/dace/sdfg/sdfg.py +++ b/dace/sdfg/sdfg.py @@ -1030,8 +1030,12 @@ def get_latest_report_path(self) -> Optional[str]: :return: A path to the latest instrumentation report, or None if one does not exist. """ path = os.path.join(self.build_folder, 'perf') - files = [f for f in os.listdir(path) if f.startswith('report-')] - if len(files) == 0: + try: + files = [f for f in os.listdir(path) if f.startswith('report-')] + except FileNotFoundError: + return None + + if not files: return None return os.path.join(path, sorted(files, reverse=True)[0]) diff --git a/doc/optimization/profiling.rst b/doc/optimization/profiling.rst index 87539e87a8..3f53d4e324 100644 --- a/doc/optimization/profiling.rst +++ b/doc/optimization/profiling.rst @@ -121,7 +121,8 @@ Instrumentation can also collect performance counters on CPUs and GPUs using `LI The :class:`~dace.dtypes.InstrumentationType.LIKWID_Counters` instrumentation type can be configured to collect a wide variety of performance counters on CPUs and GPUs. An example use can be found in the `LIKWID instrumentation code sample `_. - +There is also the :class:`~dace.dtypes.InstrumentationType.GPU_TX_MARKERS` instrumentation type which wraps in NVTX or rocTX markers the DaCe program executed on the GPU. Important parts of the execution of the program on the GPU as the different states, SDFGs and initialization and finalization phases are marked with these markers. +These markers can be used to visualize and measure the GPU activity using the NVIDIA Nsight Systems or ROCm Systems profilers. Instrumentation file format ~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/source/dace.codegen.instrumentation.rst b/doc/source/dace.codegen.instrumentation.rst index d476090d6a..0fc5941097 100644 --- a/doc/source/dace.codegen.instrumentation.rst +++ b/doc/source/dace.codegen.instrumentation.rst @@ -4,6 +4,14 @@ dace.codegen.instrumentation package Submodules ---------- +dace.codegen.instrumentation.gpu_tx_markers module +----------------------------------------------- + +.. automodule:: dace.codegen.instrumentation.gpu_tx_markers + :members: + :undoc-members: + :show-inheritance: + dace.codegen.instrumentation.fpga module ---------------------------------------- diff --git a/tests/instrumentation_test.py b/tests/instrumentation_test.py index 2aa26edf36..120e025812 100644 --- a/tests/instrumentation_test.py +++ b/tests/instrumentation_test.py @@ -4,6 +4,7 @@ import pytest import numpy as np +import re import sys import dace @@ -39,14 +40,17 @@ def onetest(instrumentation: dace.InstrumentationType, size=128): if isinstance(node, nodes.MapEntry) and node.map.label == 'mult': node.map.instrument = instrumentation state.instrument = instrumentation - # Set Timer instrumentation on the whole SDFG - if instrumentation == dace.InstrumentationType.Timer: - sdfg.instrument = instrumentation - if instrumentation == dace.InstrumentationType.GPU_Events: + if instrumentation in [dace.InstrumentationType.GPU_Events, dace.InstrumentationType.GPU_TX_MARKERS]: sdfg.apply_transformations(GPUTransformSDFG) - sdfg(A=A, B=B, C=C, N=size) + with dace.instrument(instrumentation, + filter='*', + annotate_maps=True, + annotate_tasklets=False, + annotate_states=True, + annotate_sdfgs=True): + sdfg(A=A, B=B, C=C, N=size) # Check for correctness assert np.allclose(C, 20 * A @ B) @@ -57,6 +61,22 @@ def onetest(instrumentation: dace.InstrumentationType, size=128): report = sdfg.get_latest_report() print(report) + # Check that the NVTX/rocTX range wrapper is present in the generated CPU code + if instrumentation == dace.InstrumentationType.GPU_TX_MARKERS: + code = sdfg.generate_code()[0].clean_code + tx_include = re.search(r'#include <(nvtx3/nvToolsExt|roctx).h>', code) + assert tx_include is not None + range_push = re.search(r'(nvtx|roctx)RangePush\("sdfg', code) is not None + range_push &= re.search(r'(nvtx|roctx)RangePush\("copy', code) is not None + range_push &= re.search(r'(nvtx|roctx)RangePush\("state', code) is not None + range_push &= re.search(r'(nvtx|roctx)RangePush\("alloc', code) is not None + range_push &= re.search(r'(nvtx|roctx)RangePush\("dealloc', code) is not None + range_push &= re.search(r'(nvtx|roctx)RangePush\("init', code) is not None + range_push &= re.search(r'(nvtx|roctx)RangePush\("exit', code) is not None + assert range_push + range_pop = re.search(r'(nvtx|roctx)RangePop\b', code) + assert range_pop is not None + def test_timer(): onetest(dace.InstrumentationType.Timer) @@ -73,8 +93,14 @@ def test_gpu_events(): onetest(dace.InstrumentationType.GPU_Events) +@pytest.mark.gpu +def test_gpu_tx_markers(): + onetest(dace.InstrumentationType.GPU_TX_MARKERS) + + if __name__ == '__main__': test_timer() test_papi() if len(sys.argv) > 1 and sys.argv[1] == 'gpu': test_gpu_events() + test_gpu_tx_markers() From 6cc01351a4dc3a7a85c6fac2802ea2c604c93b40 Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Wed, 5 Nov 2025 07:55:31 +0100 Subject: [PATCH 02/14] Squashed commit of the following: commit c069546935a19d7182aa9c848445b438947cc0df Merge: 41902c396 408a4819f Author: Philip Mueller Date: Tue Nov 4 11:22:14 2025 +0100 Merge remote-tracking branch 'spcl/main' into make_construct_args_public commit 41902c396f6189e47d8a177fa925a261df4d4065 Author: Philip Mueller Date: Fri Oct 31 16:01:26 2025 +0100 Fixed a bug. commit 65725f9f36f7982a430fcb663b2d1289124ae5af Author: Philip Mueller Date: Fri Oct 31 15:15:03 2025 +0100 This should be enough for bug compatibility. commit daf90e910c5ae776fd1082c7c91126c28aaf5928 Author: Philip Mueller Date: Fri Oct 31 12:58:25 2025 +0100 Updated the thing a bit more. commit 2ddabbdd51684ea691a6e59caae183e0df694768 Merge: 4da0c4ebb b44aeb061 Author: Philip Mueller Date: Fri Oct 31 12:54:19 2025 +0100 Merge remote-tracking branch 'spcl/main' into make_construct_args_public commit 4da0c4ebb32b1ff082241b865b6a1bbbedc85fee Author: Philip Mueller Date: Fri Oct 31 12:53:48 2025 +0100 Made some additional check. commit 69960ce1ce7e6839b5a279cdf551c5afc933d453 Author: Philip Mueller Date: Fri Oct 31 12:00:30 2025 +0100 Forgot to do this. commit 6e1a9ff33d3405c7ce099579ac44943ab784788f Merge: c1214fa6b 1bf217328 Author: Philip Mueller Date: Fri Oct 31 11:25:46 2025 +0100 Merge remote-tracking branch 'spcl/main' into make_construct_args_public commit c1214fa6bfa3ba80cfee6a3fde69c40e812deaea Author: Philip Mueller Date: Fri Oct 31 09:50:41 2025 +0100 Updated the tests and made it clear that you can not return a scalar from an SDFG. commit 9397a230a48508212b30ffc22173007f7f7a3111 Author: Philip Mueller Date: Fri Oct 31 09:40:29 2025 +0100 Implemented the proper handling of tuples of size one. commit e8d909e2939f1ed3ef2c9bfdb8b399eb608ae0ab Author: Philip Mueller Date: Fri Oct 31 09:30:48 2025 +0100 Removed that stupid sclar return value feature that CAN NOT WORK. However, I saw that it also, under the hood sometimes tests if the argument is a pyobject. Since that thing is a pointer it is possible and I implemented it for that. But it was again not implemented properly, since for the case when the return value is passed as a regular argument, it was not checking that, only for managed return values. commit ab110d2692d3bae873b347145875f4c01f515b57 Author: Philip Mueller Date: Fri Oct 31 09:24:45 2025 +0100 Updated the description. commit 899b2a09f38a414d7ea3a800d2bb586d630cf37e Author: Philip Mueller Date: Fri Oct 31 09:24:32 2025 +0100 Fixed some old stuff. commit 7f17e135f32ee9a43da0eb7c3872c305ef750170 Author: Philip Mueller Date: Fri Oct 31 09:08:49 2025 +0100 Fixed a bug, but in a way I do not like. commit c2c1116fc81b96a78898d1bc8650d0dc39b5dd29 Author: Philip Mueller Date: Fri Oct 31 08:40:47 2025 +0100 Removed a missleading comment. commit ded5df80e181405b394c0912f5c9bc563b351e41 Author: Philip Mueller Date: Fri Oct 31 08:04:38 2025 +0100 Made some refactoring to remove some strange DaCe behaviour. commit b029828376745e984b83f80862b61210b3547f51 Author: Philip Mueller Date: Fri Oct 31 08:02:28 2025 +0100 Fixed an issue in safe_call commit b09c9fc238b3605d3ab6e85cf0e2f12023e031b3 Author: Philip Mueller Date: Fri Oct 31 07:17:36 2025 +0100 Included the first bunch of Tal's changes. commit e138b0662223a6beebf2b48e8b6f0b0421dce362 Author: Philip Mueller Date: Thu Oct 30 15:12:23 2025 +0100 Made the 'passed as positional and named argument'-error more explicit. commit f901a3d3889cbeec5195eb7b1ecc35c777a726d1 Author: Philip Mueller Date: Thu Oct 30 15:05:00 2025 +0100 Fixed a bug in a unit test. Due to the refactoring the case that a variable is passed once as positional and as named argument is not detected and asserted. This test however, passed `a` always as positional argument and if `symbolic` is `True` also as named argument. commit 767260d41c05ef5afb3f7ceb34918cc704d1fd68 Author: Philip Mueller Date: Thu Oct 30 14:19:44 2025 +0100 Clarified a comment. commit 2b8123a7513893d8a536a6570dd79eef85c0dc8e Author: Philip Mueller Date: Thu Oct 30 13:56:20 2025 +0100 Made the construct argumt vector function publich and also refactored some things. --- dace/codegen/compiled_sdfg.py | 334 ++++++++++++++------- tests/codegen/external_memory_test.py | 2 +- tests/python_frontend/return_value_test.py | 41 ++- 3 files changed, 264 insertions(+), 113 deletions(-) diff --git a/dace/codegen/compiled_sdfg.py b/dace/codegen/compiled_sdfg.py index 733f0ba53c..0e66d7d95c 100644 --- a/dace/codegen/compiled_sdfg.py +++ b/dace/codegen/compiled_sdfg.py @@ -5,7 +5,7 @@ import re import shutil import subprocess -from typing import Any, Callable, Dict, List, Tuple, Optional, Type, Union +from typing import Any, Callable, Dict, List, Tuple, Optional, Type, Union, Sequence import warnings import tempfile import pickle @@ -151,8 +151,28 @@ def __exit__(self, *args, **kwargs): class CompiledSDFG(object): """ A compiled SDFG object that can be called through Python. - Todo: - Scalar return values are not handled properly, this is a code gen issue. + Essentially this class makes an SDFG callable. Normally a user will not create it + directly but instead it is generated by some utilities such as `SDFG.compile()`. + + The class performs the following tasks: + - It ensures that the SDFG object is properly initialized, either by a direct + call to `initialize()` or the first time it is called. Furthermore, it will + also take care of the finalization if it does out of scope. + - It transforms Python arguments into C arguments. + + Technically there are two ways how the SDFG can be called, the first is using + `__call__()`, i.e. as a normal function. However, this will always processes + the arguments and does some error checking and is thus slow. The second way + is the advanced interface, which allows to decompose the calling into different + subset. For more information see `construct_arguments()`, `fast_call()` and + `convert_return_values()`. + + :note: In previous version the arrays used as return values were sometimes reused. + However, this was changed and every time `construct_arguments()` is called + new arrays are allocated. + :note: It is not possible to return scalars. Note that currently using scalars + as return values is a validation error. The only exception are (probably) + Python objects. """ def __init__(self, sdfg, lib: ReloadableDLL, argnames: List[str] = None): @@ -161,9 +181,14 @@ def __init__(self, sdfg, lib: ReloadableDLL, argnames: List[str] = None): self._lib = lib self._initialized = False self._libhandle = ctypes.c_void_p(0) - self._lastargs = () self.do_not_execute = False + # Contains the pointer arguments that where used to call the SDFG, `__call__()` + # was used. It is also used by `get_workspace_size()`. + # NOTE: Using its content might be dangerous as only the pointers to arrays are + # stored. It is the users responsibility to ensure that they are valid. + self._lastargs = None + lib.load() # Explicitly load the library self._init = lib.get_symbol('__dace_init_{}'.format(sdfg.name)) self._init.restype = ctypes.c_void_p @@ -172,17 +197,27 @@ def __init__(self, sdfg, lib: ReloadableDLL, argnames: List[str] = None): self._cfunc = lib.get_symbol('__program_{}'.format(sdfg.name)) # Cache SDFG return values - self._create_new_arrays: bool = True self._return_syms: Dict[str, Any] = None + # It will contain the shape of the array or the name if the return array is passed as argument. self._retarray_shapes: List[Tuple[str, np.dtype, dtypes.StorageType, Tuple[int], Tuple[int], int]] = [] - self._retarray_is_scalar: List[bool] = [] + # Is only `True` if teh return value is a scalar _and_ a `pyobject`. + self._retarray_is_pyobject: List[bool] = [] self._return_arrays: List[np.ndarray] = [] self._callback_retval_references: List[Any] = [] # Avoids garbage-collecting callback return values + # If there are return values then this is `True` it is is a single value. Note that + # `False` either means that a tuple is returned or there are no return values. + # NOTE: Needed to handle the case of a tuple with one element. + self._is_single_value_ret: bool = False + if '__return' in self._sdfg.arrays: + assert not any(aname.startswith('__return_') for aname in self._sdfg.arrays.keys()) + self._is_single_value_ret = True + # Cache SDFG argument properties self._typedict = self._sdfg.arglist() self._sig = self._sdfg.signature_arglist(with_types=False, arglist=self._typedict) self._free_symbols = self._sdfg.free_symbols + self._constants = self._sdfg.constants self.argnames = argnames if self.argnames is None and len(sdfg.arg_names) != 0: @@ -269,12 +304,21 @@ def get_workspace_sizes(self) -> Dict[dtypes.StorageType, int]: """ Returns the total external memory size to be allocated for this SDFG. + Note that the function queries the sizes of the last call that was made by + `__call__()` or `initialize()`. Calls made by `fast_call()` or `safe_call()` + will not be considered. + :return: A dictionary mapping storage types to the number of bytes necessary to allocate for the SDFG to work properly. + :note: It is the users responsibility that all arguments, especially the array + arguments, remain valid between the call to `__call__()` or `initialize()` + and the call to this function. """ if not self._initialized: raise ValueError('Compiled SDFG is uninitialized, please call ``initialize`` prior to ' 'querying external memory size.') + if self._lastargs is None: + raise ValueError('To use `get_workspace_sizes()` `__call__()` or `initialize()` must be called before.') result: Dict[dtypes.StorageType, int] = {} for storage in self.external_memory_types: @@ -288,15 +332,24 @@ def set_workspace(self, storage: dtypes.StorageType, workspace: Any): """ Sets the workspace for the given storage type to the given buffer. + Note that the function queries the sizes of the last call that was made by + `__call__()` or `initialize()`. Calls made by `fast_call()` or `safe_call()` + will not be considered. + :param storage: The storage type to fill. :param workspace: An array-convertible object (through ``__[cuda_]array_interface__``, see ``array_interface_ptr``) to use for the workspace. + :note: It is the users responsibility that all arguments, especially the array + arguments, remain valid between the call to `__call__()` or `initialize()` + and the call to this function. """ if not self._initialized: raise ValueError('Compiled SDFG is uninitialized, please call ``initialize`` prior to ' 'setting external memory.') if storage not in self.external_memory_types: raise ValueError(f'Compiled SDFG does not specify external memory of {storage}') + if self._lastargs is None: + raise ValueError('To use `get_workspace_sizes()` `__call__()` or `initialize()` must be called before.') func = self._lib.get_symbol(f'__dace_set_external_memory_{storage.name}', None) ptr = dtypes.array_interface_ptr(workspace, storage) @@ -331,12 +384,13 @@ def initialize(self, *args, **kwargs): if self._initialized: return - if len(args) > 0 and self.argnames is not None: - kwargs.update({aname: arg for aname, arg in zip(self.argnames, args)}) - # Construct arguments in the exported C function order - _, initargtuple = self._construct_args(kwargs) + callargtuple, initargtuple = self.construct_arguments(*args, **kwargs) self._initialize(initargtuple) + + # The main reason for setting `_lastargs` here is, to allow calls to `get_workspace_size()`. + self._lastargs = (callargtuple, initargtuple) + return self._libhandle def finalize(self): @@ -361,38 +415,34 @@ def __call__(self, *args, **kwargs): """ Forwards the Python call to the compiled ``SDFG``. - The order of the positional arguments is expected to be the same as in - the ``argnames`` member. The function will roughly perform the - following tasks: - - Change the order of the Python arguments into the one required by - the binary. - - Performing some basic sanity checks. - - Transforming the Python arguments into their ``C`` equivalents. - - Allocate the memory for the return values. - - Call the ``C` function. + The order of the positional arguments is expected to be the same as in the + ``argnames`` member. The function will perform the following tasks: + - Calling ``construct_arguments()`` and creating the argument vector and + allocating the memory for the return values. + - Performing the actual call by means of ``fast_call()``, with enabled error + checks. + - Then it will convert the return value into the expected format by means of + ``convert_return_values()`` and return that value. :note: The memory for the return values is only allocated the first time this function is called. Thus, this function will always return the same objects. To force the allocation of new memory you can call ``clear_return_values()`` in advance. """ - if self.argnames is None and len(args) != 0: - raise KeyError(f"Passed positional arguments to an SDFG that does not accept them.") - elif len(args) > 0 and self.argnames is not None: - kwargs.update( - # `_construct_args` will handle all of its arguments as kwargs. - { - aname: arg - for aname, arg in zip(self.argnames, args) - }) - argtuple, initargtuple = self._construct_args(kwargs) # Missing arguments will be detected here. - # Return values are cached in `self._lastargs`. - return self.fast_call(argtuple, initargtuple, do_gpu_check=True) + argtuple, initargtuple = self.construct_arguments(*args, **kwargs) # Missing arguments will be detected here. + self._lastargs = (argtuple, initargtuple) + self.fast_call(argtuple, initargtuple, do_gpu_check=True) + return self.convert_return_values() def safe_call(self, *args, **kwargs): """ Forwards the Python call to the compiled ``SDFG`` in a separate process to avoid crashes in the main process. Raises an exception if the SDFG execution fails. + + Note the current implementation lacks the proper handling of return values. + Thus output can only be transmitted through inout arguments. """ + if any(aname == '__return' or aname.startswith('__return_') for aname in self.sdfg.arrays.keys()): + raise NotImplementedError('`CompiledSDFG.safe_call()` does not support return values.') # Pickle the SDFG and arguments with tempfile.NamedTemporaryFile(mode='wb', delete=False) as f: @@ -444,24 +494,25 @@ def safe_call(self, *args, **kwargs): def fast_call( self, - callargs: Tuple[Any, ...], - initargs: Tuple[Any, ...], + callargs: Sequence[Any], + initargs: Sequence[Any], do_gpu_check: bool = False, - ) -> Union[Tuple[Any, ...], Any]: + ) -> None: """ - Calls the underlying binary functions directly and bypassing - argument sanitation. + Calls the underlying binary functions directly and bypassing argument sanitation. - This is a faster, but less user friendly version of ``__call__()``. - While ``__call__()`` will transforms its Python arguments such that - they can be forwarded, this function assumes that this processing - was already done by the user. + This is a faster, but less user friendly version of ``__call__()``. While + ``__call__()`` will transforms its Python arguments such that they can be + forwarded and allocate memory for the return values, this function assumes + that this processing was already done by the user. + To build the argument vectors you should use `self.construct_arguments()`. :param callargs: Arguments passed to the actual computation. :param initargs: Arguments passed to the initialization function. :param do_gpu_check: Check if errors happened on the GPU. - :note: You may use `_construct_args()` to generate the processed arguments. + :note: This is an advanced interface. + :note: In previous versions this function also called `convert_return_values()`. """ try: # Call initializer function if necessary, then SDFG @@ -485,8 +536,7 @@ def fast_call( if lasterror is not None: raise RuntimeError( f'An error was detected when calling "{self._sdfg.name}": {self._get_error_text(lasterror)}') - - return self._convert_return_values() + return except (RuntimeError, TypeError, UnboundLocalError, KeyError, cgx.DuplicateDLLError, ReferenceError): self._lib.unload() raise @@ -498,18 +548,40 @@ def __del__(self): self._libhandle = ctypes.c_void_p(0) self._lib.unload() - def _construct_args(self, kwargs) -> Tuple[Tuple[Any], Tuple[Any]]: - """ - Main function that controls argument construction for calling - the C prototype of the SDFG. + def construct_arguments(self, *args: Any, **kwargs: Any) -> Tuple[Tuple[Any], Tuple[Any]]: + """Construct the argument vectors suitable for from its argument. - Organizes arguments first by ``sdfg.arglist``, then data descriptors - by alphabetical order, then symbols by alphabetical order. + The function returns a pair of tuple, that are suitable for `fast_call()`. + The first element of is `callargs`, i.e. the full arguments, while the + second element is `initargs`, which is only used/needed the first time + an SDFG is called. - :note: If not initialized this function will initialize the memory for - the return values, however, it might also reallocate said memory. - :note: This function will also update the internal argument cache. + It is important that this function will also allocate new return values. + The array objects are managed by `self` and remain valid until this + function is called again. However, they are also returned by `self.__call__()`. + + It is also possible to pass the array, that should be used to return a value, + directly as argument. In that case the allocation for that return value will + be skipped. + + :note: In case of arrays, the returned argument vectors only contains the + pointers to the underlying memory. Thus it is the user's responsibility + to ensure that the memory remains allocated until the argument vector + is used. + :note: This is an advanced interface. """ + if self.argnames is None and len(args) != 0: + raise KeyError(f"Passed positional arguments to an SDFG that does not accept them.") + elif len(args) > 0 and self.argnames is not None: + positional_arguments = {aname: avalue for aname, avalue in zip(self.argnames, args)} + if not positional_arguments.keys().isdisjoint(kwargs.keys()): + raise ValueError( + f'The arguments where passed once as positional and named arguments: {set(positional_arguments.keys()).intersection(kwargs.keys())}' + ) + kwargs.update(positional_arguments) + + # NOTE: This might invalidate the elements associated to the return values of + # all argument vectors that were created before. self._initialize_return_values(kwargs) # Add the return values to the arguments, since they are part of the C signature. @@ -539,31 +611,51 @@ def _construct_args(self, kwargs) -> Tuple[Tuple[Any], Tuple[Any]]: argnames = [] sig = [] - # Type checking - cargs = [] no_view_arguments = not Config.get_bool('compiler', 'allow_view_arguments') - for i, (a, arg, atype) in enumerate(zip(argnames, arglist, argtypes)): - carg = dt.make_ctypes_argument(arg, - atype, - a, - allow_views=not no_view_arguments, - symbols=kwargs, - callback_retval_references=self._callback_retval_references) - cargs.append(carg) - - constants = self.sdfg.constants + cargs = tuple( + dt.make_ctypes_argument(aval, + atype, + aname, + allow_views=not no_view_arguments, + symbols=kwargs, + callback_retval_references=self._callback_retval_references) + for aval, atype, aname in zip(arglist, argtypes, argnames)) + symbols = self._free_symbols callparams = tuple((carg, aname) for arg, carg, aname in zip(arglist, cargs, argnames) - if not (symbolic.issymbolic(arg) and (hasattr(arg, 'name') and arg.name in constants))) - - newargs = tuple(carg for carg, aname in callparams) + if not ((hasattr(arg, 'name') and arg.name in self._constants) and symbolic.issymbolic(arg))) + newargs = tuple(carg for carg, _aname in callparams) initargs = tuple(carg for carg, aname in callparams if aname in symbols) - self._lastargs = newargs, initargs - return self._lastargs + return (newargs, initargs) + + def convert_return_values(self) -> Union[Any, Tuple[Any, ...]]: + """Convert the return arguments. + + Execute the `return` statement and return. This function should only be called + after `fast_call()` has been run. + Keep in mid that it is not possible to return scalars (with the exception of + `pyobject`s), they will be always returned as an array with shape `(1,)`. + + :note: This is an advanced interface. + :note: After `fast_call()` returns it is only allowed to call this function once. + """ + # TODO: Make sure that the function is called only once by checking it. + # NOTE: Currently it is not possible to return a scalar value, see `tests/sdfg/scalar_return.py` + if not self._return_arrays: + return None + elif self._is_single_value_ret: + assert len(self._return_arrays) == 1 + return self._return_arrays[0].item() if self._retarray_is_pyobject[0] else self._return_arrays[0] + else: + return tuple(r.item() if is_pyobj else r + for r, is_pyobj in zip(self._return_arrays, self._retarray_is_pyobject)) def clear_return_values(self): - self._create_new_arrays = True + warnings.warn( + 'The "CompiledSDFG.clear_return_values" API is deprecated, as this behaviour has' + ' become the new default, and is a noops.', DeprecationWarning) + pass def _create_array(self, _: str, dtype: np.dtype, storage: dtypes.StorageType, shape: Tuple[int], strides: Tuple[int], total_size: int): @@ -599,52 +691,76 @@ def _initialize_return_values(self, kwargs): # Clear references from last call (allow garbage collection) self._callback_retval_references.clear() - if self._initialized: - if self._return_syms == syms: - if not self._create_new_arrays: - return - else: - self._create_new_arrays = False - # Use stored sizes to recreate arrays (fast path) - self._return_arrays = tuple(kwargs[desc[0]] if desc[0] in kwargs else self._create_array(*desc) - for desc in self._retarray_shapes) - return + if self._initialized and self._return_syms == syms: + # Use stored sizes to recreate arrays (fast path) + self._return_arrays = tuple(kwargs[desc[0]] if desc[0] in kwargs else self._create_array(*desc) + for desc in self._retarray_shapes) + return self._return_syms = syms - self._create_new_arrays = False - - # Initialize return values with numpy arrays - self._retarray_shapes = [] self._return_arrays = [] + self._retarray_shapes = [] + self._retarray_is_pyobject = [] for arrname, arr in sorted(self.sdfg.arrays.items()): - if arrname.startswith('__return') and not arr.transient: - if arrname in kwargs: + if arrname.startswith('__return'): + if arr.transient: + raise ValueError(f'Used the special array name "{arrname}" as transient.') + + elif arrname in kwargs: + # The return value is passed as an argument, in that case store the name in `self._retarray_shapes`. + warnings.warn(f'Return value "{arrname}" is passed as a regular argument.', stacklevel=2) self._return_arrays.append(kwargs[arrname]) - self._retarray_is_scalar.append(isinstance(arr, dt.Scalar)) self._retarray_shapes.append((arrname, )) - continue - if isinstance(arr, dt.Stream): + elif isinstance(arr, dt.Stream): raise NotImplementedError('Return streams are unsupported') - shape = tuple(symbolic.evaluate(s, syms) for s in arr.shape) - dtype = arr.dtype.as_numpy_dtype() - total_size = int(symbolic.evaluate(arr.total_size, syms)) - strides = tuple(symbolic.evaluate(s, syms) * arr.dtype.bytes for s in arr.strides) - shape_desc = (arrname, dtype, arr.storage, shape, strides, total_size) - self._retarray_is_scalar.append(isinstance(arr, dt.Scalar) or isinstance(arr.dtype, dtypes.pyobject)) - self._retarray_shapes.append(shape_desc) - - # Create an array with the properties of the SDFG array - arr = self._create_array(*shape_desc) - self._return_arrays.append(arr) + else: + shape = tuple(symbolic.evaluate(s, syms) for s in arr.shape) + dtype = arr.dtype.as_numpy_dtype() + total_size = int(symbolic.evaluate(arr.total_size, syms)) + strides = tuple(symbolic.evaluate(s, syms) * arr.dtype.bytes for s in arr.strides) + shape_desc = (arrname, dtype, arr.storage, shape, strides, total_size) + self._retarray_shapes.append(shape_desc) + + # Create an array with the properties of the SDFG array + return_array = self._create_array(*shape_desc) + self._return_arrays.append(return_array) + + # BUG COMPATIBILITY(PR#2206): + # In the original version `_retarray_is_pyobject` was named `_retarray_is_scalar`, however + # since scalars could not be returned on an [implementation level](https://github.com/spcl/dace/pull/1609) + # it was essentially useless. But was used for `pyobject` in _some_ cases. And indeed, + # since `pyobject`s are essentially `void` pointers is was, in principle possible, to return/pass + # them as "scalars", read "not inside an array". + # However, if the return value was passed as argument, i.e. the first `elif`, then it + # was ignored if `arr` was a `pyobject`. Only if the return value was managed by `self`, + # i.e. the `else` case, then it was considered, in a way at least. The problem was, that it was + # done using the following check: + # `isinstance(arr, dt.Scalar) or isinstance(arr.dtype, dtypes.pyobject)` + # Because of the `or` that is used, _everything_ whose `dtype` is `pyobject` was classified + # as a scalar `pyobject`, i.e. one element, even if it was in fact an array of millions of `pyobject`s. + # The correct behaviour would be to change the `or` to an `and` but then several unit + # tests (`test_pyobject_return`, `test_pyobject_return_tuple` and `test_nested_autoparse[False]` + # in `tests/python_frontend/callee_autodetect_test.py`) will fail. + # The following code is bug compatible and also allows to pass a `pyobject` directly, i.e. + # through `kwargs`. + if isinstance(arr.dtype, dtypes.pyobject): + if isinstance(arr, dt.Scalar): + # Proper scalar. + self._retarray_is_pyobject.append(True) + elif isinstance(arr, dt.Array): + # An array, let's check if it is just a wrapper for a single value. + if not (len(arr.shape) == 1 and arr.shape[0] == 1): + warnings.warn(f'Decay an array of `pyobject`s with shape {arr.shape} to a single one.', + stacklevel=2) + self._retarray_is_pyobject.append(True) + else: + raise ValueError( + f'Does not know how to handle "{arrname}", which is a {type(arr).__name__} of `pyobject`.') + else: + self._retarray_is_pyobject.append(False) - def _convert_return_values(self): - # Return the values as they would be from a Python function - # NOTE: Currently it is not possible to return a scalar value, see `tests/sdfg/scalar_return.py` - if not self._return_arrays: - return None - elif len(self._return_arrays) == 1: - return self._return_arrays[0].item() if self._retarray_is_scalar[0] else self._return_arrays[0] - else: - return tuple(r.item() if scalar else r for r, scalar in zip(self._return_arrays, self._retarray_is_scalar)) + assert (not self._is_single_value_ret) or (len(self._return_arrays) == 1) + assert len(self._return_arrays) == len(self._retarray_shapes) == len(self._retarray_is_pyobject) + self._return_arrays = tuple(self._return_arrays) diff --git a/tests/codegen/external_memory_test.py b/tests/codegen/external_memory_test.py index 169e050914..47eac55ff3 100644 --- a/tests/codegen/external_memory_test.py +++ b/tests/codegen/external_memory_test.py @@ -30,7 +30,7 @@ def tester(a: dace.float64[N]): a = np.random.rand(20) if symbolic: - extra_args = dict(a=a, N=20) + extra_args = dict(N=20) else: extra_args = {} diff --git a/tests/python_frontend/return_value_test.py b/tests/python_frontend/return_value_test.py index 4a845bea0b..4e704287bc 100644 --- a/tests/python_frontend/return_value_test.py +++ b/tests/python_frontend/return_value_test.py @@ -9,7 +9,15 @@ def test_return_scalar(): def return_scalar(): return 5 - assert return_scalar() == 5 + res = return_scalar() + assert res == 5 + + # Don't be fooled by the test above the return value is an array. If you would + # add the return value annotation to the program, i.e. `-> dace.int32` you would + # get a validation error. + assert isinstance(res, np.ndarray) + assert res.shape == (1, ) + assert res.dtype == np.int64 def test_return_scalar_in_nested_function(): @@ -22,7 +30,15 @@ def nested_function() -> dace.int32: def return_scalar(): return nested_function() - assert return_scalar() == 5 + res = return_scalar() + assert res == 5 + + # Don't be fooled by the test above the return value is an array. If you would + # add the return value annotation to the program, i.e. `-> dace.int32` you would + # get a validation error. + assert isinstance(res, np.ndarray) + assert res.shape == (1, ) + assert res.dtype == np.int32 def test_return_array(): @@ -42,6 +58,8 @@ def return_tuple(): return 5, 6 res = return_tuple() + assert isinstance(res, tuple) + assert len(res) == 2 assert res == (5, 6) @@ -52,6 +70,8 @@ def return_array_tuple(): return 5 * np.ones(5), 6 * np.ones(6) res = return_array_tuple() + assert isinstance(res, tuple) + assert len(res) == 2 assert np.allclose(res[0], 5 * np.ones(5)) assert np.allclose(res[1], 6 * np.ones(6)) @@ -66,10 +86,25 @@ def return_void(a: dace.float64[20]): a = np.random.rand(20) ref = a + 1 - return_void(a) + res = return_void(a) + assert res is None assert np.allclose(a, ref) +def test_return_tuple_1_element(): + + @dace.program + def return_one_element_tuple(a: dace.float64[20]): + return (a + 3.5, ) + + a = np.random.rand(20) + ref = a + 3.5 + res = return_one_element_tuple(a) + assert isinstance(res, tuple) + assert len(res) == 1 + assert np.allclose(res[0], ref) + + def test_return_void_in_if(): @dace.program From 80d19317f9d243b38fa8f7c52f8ca71f024d4905 Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Wed, 5 Nov 2025 07:56:02 +0100 Subject: [PATCH 03/14] Squashed commit of the following: commit a3e5b390233b4a42556c0c90d33125836fd1a79f Author: Philip Mueller Date: Wed Nov 5 07:16:37 2025 +0100 Made `ReloadableDLL` uncopyable. commit e3d429bc51c3a7af5f577ce380db87e3a436098f Author: Philip Mueller Date: Wed Nov 5 07:16:27 2025 +0100 Ensured that no library is overwriten. --- dace/codegen/compiled_sdfg.py | 37 ++++++++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 5 deletions(-) diff --git a/dace/codegen/compiled_sdfg.py b/dace/codegen/compiled_sdfg.py index 0e66d7d95c..4ba47ee326 100644 --- a/dace/codegen/compiled_sdfg.py +++ b/dace/codegen/compiled_sdfg.py @@ -9,6 +9,7 @@ import warnings import tempfile import pickle +import pathlib import sys import numpy as np @@ -77,7 +78,8 @@ def is_loaded(self) -> bool: lib_cfilename = ctypes.c_wchar_p(self._library_filename) else: # As UTF-8 - lib_cfilename = ctypes.c_char_p(self._library_filename.encode('utf-8')) + tt = self._library_filename.encode('utf-8') + lib_cfilename = ctypes.c_char_p(tt) return self._stub.is_library_loaded(lib_cfilename) == 1 @@ -96,21 +98,39 @@ def load(self): # Check if library is already loaded is_loaded = True lib_cfilename = None + lib_filename = self._library_filename + counter = 0 while is_loaded: # Convert library filename to string according to OS if os.name == 'nt': # As UTF-16 - lib_cfilename = ctypes.c_wchar_p(self._library_filename) + lib_cfilename = ctypes.c_wchar_p(lib_filename) else: # As UTF-8 - lib_cfilename = ctypes.c_char_p(self._library_filename.encode('utf-8')) + lib_cfilename = ctypes.c_char_p(lib_filename.encode('utf-8')) + # Test if the library is loaded. is_loaded = self._stub.is_library_loaded(lib_cfilename) + if is_loaded == 1: warnings.warn(f'Library {self._library_filename} already loaded, renaming file') + + # The library is loaded, copy the _original_ library file to a new file + # and then try to load that. We only do the copy if the new new name is + # free. It seems that at least on LINUX there is some issue if we + # overwrite a file that already exists. + lib_filename = self._library_filename + f'_{counter}' + counter += 1 + if pathlib.Path(lib_filename).exists(): + assert pathlib.Path(lib_filename).is_file() + continue + + # The file name is not taken, so make a copy. There might be a race condition + # here in the presence of multiple processes. + # TODO: Investigate if we should switch to hardlinks if they are supported. try: - shutil.copyfile(self._library_filename, self._library_filename + '_') - self._library_filename += '_' + assert self._library_filename != lib_filename + shutil.copyfile(self._library_filename, lib_filename) except shutil.Error: raise cgx.DuplicateDLLError(f'Library {os.path.basename(self._library_filename)}' 'is already loaded somewhere else and cannot be unloaded. ' @@ -118,6 +138,7 @@ def load(self): # Actually load the library self._lib = ctypes.c_void_p(self._stub.load_library(lib_cfilename)) + self._library_filename = lib_filename if self._lib.value is None: # Try to understand why the library is not loading, if dynamic @@ -147,6 +168,12 @@ def __enter__(self, *args, **kwargs): def __exit__(self, *args, **kwargs): self.unload() + def __copy__(self): + raise RuntimeError(f'Can not copy ReloadableDLL({self._library_filename})') + + def __deepcopy__(self, memodict={}): + raise RuntimeError(f'Can not copy ReloadableDLL({self._library_filename})') + class CompiledSDFG(object): """ A compiled SDFG object that can be called through Python. From 7a0d751271cda13a57349d1e90169f8ded2c3865 Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Wed, 5 Nov 2025 07:56:26 +0100 Subject: [PATCH 04/14] Updated the version. --- dace/version.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dace/version.py b/dace/version.py index 1f356cc57b..a71a5118e7 100644 --- a/dace/version.py +++ b/dace/version.py @@ -1 +1 @@ -__version__ = '1.0.0' +__version__ = '2025.11.05' From e305fe389fe72aed3553073400f3e565e0aff56a Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Fri, 21 Nov 2025 14:08:47 +0100 Subject: [PATCH 05/14] Added the updater. --- .github/workflows/dace-updater.yml | 42 ++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 .github/workflows/dace-updater.yml diff --git a/.github/workflows/dace-updater.yml b/.github/workflows/dace-updater.yml new file mode 100644 index 0000000000..e79d789773 --- /dev/null +++ b/.github/workflows/dace-updater.yml @@ -0,0 +1,42 @@ +name: Inforrm the Python package index about a new DaCe release +# Must be installed into the DaCe fork. + +on: + push: + # Only run once a new tag has been created. + # TODO: Make sure that the tag is passed to the index update workflow. + tags: + - __gt4py-next-integration_* + workflow_dispatch: + + # We need this until this file is not in `main`, without it the web interface will not pick it up. + # See https://stackoverflow.com/a/71057825 + pull_request: + +jobs: + update-dace: + runs-on: ubuntu-latest + steps: + - name: Print all variables + shell: bash + run: | + INDEX_ORGANIZATION="gridtools" + INDEX_REPO="python-pkg-index" + + # Only needed for installation + exit 0 + + # We use `github.ref_name` here because we only run for a tag and they should be unique. + # If we would use `github.ref` then we would have `refs/tags/`. An alternative + # would also be `github.sha`. + DEPENDENCY_REF="${{ github.ref_name }}" + SOURCE_REPO="dace" + SOURCE_OWNER="gridtools" + + curl -L -v \ + -X POST \ + -H "Accept: application/vnd.github+json" \ + -H "Authorization: Bearer ${{ secrets.PKG_UPDATE_TOKEN }}" \ + -H "X-GitHub-Api-Version: 2022-11-28" \ + "https://api.github.com/repos/${INDEX_ORGANIZATION}/${INDEX_REPO}/dispatches" \ + -d '{"event_type":"update_package_index","client_payload":{"source_repo":"'"${SOURCE_REPO}"'","source_org":"'"${SOURCE_ORG}"'","dependency_ref":"'"${DEPENDENCY_REF}"'"}}' From f1418509a1c74854239f3f0c8da78ea243ec2a44 Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Fri, 21 Nov 2025 14:09:58 +0100 Subject: [PATCH 06/14] Let's disable a bit more. --- .github/workflows/dace-updater.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/dace-updater.yml b/.github/workflows/dace-updater.yml index e79d789773..266e99fe73 100644 --- a/.github/workflows/dace-updater.yml +++ b/.github/workflows/dace-updater.yml @@ -2,11 +2,11 @@ name: Inforrm the Python package index about a new DaCe release # Must be installed into the DaCe fork. on: - push: + #push: # Only run once a new tag has been created. # TODO: Make sure that the tag is passed to the index update workflow. - tags: - - __gt4py-next-integration_* + #tags: + #- __gt4py-next-integration_* workflow_dispatch: # We need this until this file is not in `main`, without it the web interface will not pick it up. From ab0375bf0a00251f18db991a059cfdae1a44cacb Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Fri, 21 Nov 2025 14:13:24 +0100 Subject: [PATCH 07/14] Removed the others for the experiments. --- .github/workflows/copilot-setup-steps.yml | 40 ------- .github/workflows/fpga-ci.yml | 75 ------------ .github/workflows/general-ci.yml | 112 ------------------ .github/workflows/gpu-ci.yml | 76 ------------ .github/workflows/hardware_test.yml | 25 ---- .github/workflows/heterogeneous-ci.yml | 96 --------------- .github/workflows/linting.yml | 36 ------ .github/workflows/pyFV3-ci.yml | 105 ---------------- .github/workflows/release.sh | 18 --- .github/workflows/scripts/show-git-diff.sh | 21 ---- .github/workflows/verilator_compatibility.yml | 37 ------ 11 files changed, 641 deletions(-) delete mode 100644 .github/workflows/copilot-setup-steps.yml delete mode 100644 .github/workflows/fpga-ci.yml delete mode 100644 .github/workflows/general-ci.yml delete mode 100644 .github/workflows/gpu-ci.yml delete mode 100644 .github/workflows/hardware_test.yml delete mode 100644 .github/workflows/heterogeneous-ci.yml delete mode 100644 .github/workflows/linting.yml delete mode 100644 .github/workflows/pyFV3-ci.yml delete mode 100755 .github/workflows/release.sh delete mode 100755 .github/workflows/scripts/show-git-diff.sh delete mode 100644 .github/workflows/verilator_compatibility.yml diff --git a/.github/workflows/copilot-setup-steps.yml b/.github/workflows/copilot-setup-steps.yml deleted file mode 100644 index fe60e4867c..0000000000 --- a/.github/workflows/copilot-setup-steps.yml +++ /dev/null @@ -1,40 +0,0 @@ -name: "Copilot Setup Steps" -on: workflow_dispatch - -jobs: - copilot-setup-steps: - runs-on: ubuntu-latest - permissions: - contents: read - steps: - - name: Checkout code - uses: actions/checkout@v4 - with: - submodules: recursive - - - name: Set up C++ compiler - run: | - sudo apt-get update - sudo apt-get install -y g++ build-essential - - - name: Set up CMake - run: | - sudo apt-get install -y cmake - - - name: Set up Python - uses: actions/setup-python@v4 - with: - python-version: '3.11' - - - name: Install Python dependencies - run: | - python -m pip install --upgrade pip - # Install additional testing dependencies - python -m pip install pytest pytest-cov - python -m pip install flake8 pytest-xdist coverage - - - name: Install DaCe in development mode - run: | - python -m pip install --editable ".[testing,linting]" - pre-commit install - pre-commit run diff --git a/.github/workflows/fpga-ci.yml b/.github/workflows/fpga-ci.yml deleted file mode 100644 index 926d4c69e9..0000000000 --- a/.github/workflows/fpga-ci.yml +++ /dev/null @@ -1,75 +0,0 @@ -name: FPGA Tests - -on: - push: - branches: [ main, ci-fix ] - pull_request: - branches: [ main, ci-fix ] - merge_group: - branches: [ main, ci-fix ] - -env: - CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }} - -concurrency: - group: ${{github.workflow}}-${{github.ref}} - cancel-in-progress: true - -jobs: - test-fpga: - if: ${{ !contains(github.event.pull_request.labels.*.name, 'no-ci') }} - runs-on: [self-hosted, linux, intel-fpga, xilinx-fpga] - steps: - - uses: actions/checkout@v4 - with: - submodules: 'recursive' - - name: Install dependencies - run: | - rm -f ~/.dace.conf - rm -rf .dacecache tests/.dacecache - python -m venv ~/.venv # create venv so we can use pip - source ~/.venv/bin/activate # activate venv - python -m pip install --upgrade pip - pip install pytest-xdist flake8 coverage click - pip uninstall -y dace - pip install -e ".[testing]" - curl -Os https://uploader.codecov.io/latest/linux/codecov - chmod +x codecov - - - name: Run FPGA Tests - run: | - source ~/.venv/bin/activate # activate venv - export COVERAGE_RCFILE=`pwd`/.coveragerc - - # Xilinx setup - export PATH=/opt/Xilinx/Vitis/2022.1/bin:/opt/Xilinx/Vitis_HLS/2022.1/bin:/opt/Xilinx/Vivado/2022.1/bin:$PATH - export XILINX_XRT=/opt/xilinx/xrt - export LD_LIBRARY_PATH=$XILINX_XRT/lib:$LD_LIBRARY_PATH - export XILINX_VITIS=/opt/Xilinx/Vitis/2022.1 - export DACE_compiler_xilinx_platform=xilinx_u250_gen3x16_xdma_4_1_202210_1 - - # Intel FPGA setup - export INTELFPGAOCLSDKROOT=/opt/intelFPGA_pro/19.1/hld - export ALTERAOCLSDKROOT=$INTELFPGAOCLSDKROOT - export AOCL_BOARD_PACKAGE_ROOT=/opt/intelFPGA_pro/19.1/hld/board/a10_ref - export PATH=$INTELFPGAOCLSDKROOT/bin:$PATH - export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$AOCL_BOARD_PACKAGE_ROOT/linux64/lib - export QUARTUS_ROOTDIR_OVERRIDE=/opt/intelFPGA_pro/19.1/quartus - export LD_PRELOAD=/lib/x86_64-linux-gnu/libstdc++.so.6 # Work around dependency issues - - # Due to an internal bug in the Xilinx tools, where the current datetime is passed as an integer - # and overflowed in the year 2022, run the FPGA tests pretending like it's January 1st 2021. - # faketime -f "@2021-01-01 00:00:00" pytest -n auto --cov-report=xml --cov=dace --tb=short -m "fpga" - # Try running without faketime - pytest -n auto --cov-report=xml --cov=dace --tb=short --timeout_method thread --timeout=300 -m "fpga" - - coverage report - coverage xml - reachable=0 - ping -W 2 -c 1 codecov.io || reachable=$? - if [ $reachable -eq 0 ]; then - ./codecov - else - echo "Codecov.io is unreachable" - fi - killall -9 xsim xsimk || true diff --git a/.github/workflows/general-ci.yml b/.github/workflows/general-ci.yml deleted file mode 100644 index 1d9dc3fa79..0000000000 --- a/.github/workflows/general-ci.yml +++ /dev/null @@ -1,112 +0,0 @@ -name: General Tests - -on: - push: - branches: [ main, ci-fix ] - pull_request: - branches: [ main, ci-fix ] - merge_group: - branches: [ main, ci-fix ] - -concurrency: - group: ${{github.workflow}}-${{github.ref}} - cancel-in-progress: true - -jobs: - test: - if: "!contains(github.event.pull_request.labels.*.name, 'no-ci')" - runs-on: ubuntu-latest - strategy: - matrix: - python-version: ['3.9','3.13'] - simplify: [0,1,autoopt] - - steps: - - uses: actions/checkout@v4 - with: - submodules: 'recursive' - - name: Set up Python ${{ matrix.python-version }} - uses: actions/setup-python@v5 - with: - python-version: ${{ matrix.python-version }} - - name: Install dependencies - run: | - # Make dependency setup faster - echo 'set man-db/auto-update false' | sudo debconf-communicate >/dev/null - sudo dpkg-reconfigure man-db - # Install dependencies - sudo apt-get update - sudo apt-get install -y libyaml-dev cmake libblas-dev libopenblas-dev liblapacke-dev libpapi-dev papi-tools - pip install flake8 pytest-xdist coverage - pip install -e ".[testing]" - curl -Os https://uploader.codecov.io/latest/linux/codecov - chmod +x codecov - - - name: Test dependencies - run: | - papi_avail - - - name: Test with pytest - run: | - export NOSTATUSBAR=1 - export DACE_testing_serialization=1 - export DACE_testing_deserialize_exception=1 - export DACE_cache=unique - if [ "${{ matrix.simplify }}" = "autoopt" ]; then - export DACE_optimizer_automatic_simplification=1 - export DACE_optimizer_autooptimize=1 - echo "Auto-optimization heuristics" - else - export DACE_optimizer_automatic_simplification=${{ matrix.simplify }} - fi - pytest -n auto --cov-report=xml --cov=dace --tb=short --timeout_method thread --timeout=300 -m "not gpu and not verilator and not tensorflow and not mkl and not sve and not papi and not mlir and not lapack and not fpga and not mpi and not rtl_hardware and not scalapack and not datainstrument and not long and not sequential" - ./codecov - - - name: Test OpenBLAS LAPACK - run: | - export NOSTATUSBAR=1 - export DACE_testing_serialization=1 - export DACE_testing_deserialize_exception=1 - export DACE_cache=unique - if [ "${{ matrix.simplify }}" = "autoopt" ]; then - export DACE_optimizer_automatic_simplification=1 - export DACE_optimizer_autooptimize=1 - echo "Auto-optimization heuristics" - else - export DACE_optimizer_automatic_simplification=${{ matrix.simplify }} - fi - pytest -n 1 --cov-report=xml --cov=dace --tb=short --timeout_method thread --timeout=300 -m "lapack" - ./codecov - - - name: Run sequential tests - run: | - export NOSTATUSBAR=1 - export DACE_testing_serialization=1 - export DACE_testing_deserialize_exception=1 - export DACE_cache=unique - if [ "${{ matrix.simplify }}" = "autoopt" ]; then - export DACE_optimizer_automatic_simplification=1 - export DACE_optimizer_autooptimize=1 - echo "Auto-optimization heuristics" - else - export DACE_optimizer_automatic_simplification=${{ matrix.simplify }} - fi - pytest -n 1 --cov-report=xml --cov=dace --tb=short --timeout_method thread --timeout=300 -m "sequential" - ./codecov - - - name: Run other tests - run: | - export NOSTATUSBAR=1 - export DACE_testing_serialization=0 - export DACE_testing_deserialize_exception=1 - export DACE_cache=single - export DACE_optimizer_automatic_simplification=${{ matrix.simplify }} - export PYTHON_BINARY="coverage run --source=dace --parallel-mode" - ./tests/polybench_test.sh - ./tests/xform_test.sh - coverage combine .; coverage report; coverage xml - - - uses: codecov/codecov-action@v4 - with: - token: ${{ secrets.CODECOV_TOKEN }} - verbose: true diff --git a/.github/workflows/gpu-ci.yml b/.github/workflows/gpu-ci.yml deleted file mode 100644 index d6064cb542..0000000000 --- a/.github/workflows/gpu-ci.yml +++ /dev/null @@ -1,76 +0,0 @@ -name: GPU Tests - -on: - push: - branches: [ main, ci-fix ] - pull_request: - branches: [ main, ci-fix ] - merge_group: - branches: [ main, ci-fix ] - -env: - CUDACXX: /usr/local/cuda/bin/nvcc - MKLROOT: /opt/intel/oneapi/mkl/latest/ - CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }} - -concurrency: - group: ${{github.workflow}}-${{github.ref}} - cancel-in-progress: true - -jobs: - test-gpu: - if: "!contains(github.event.pull_request.labels.*.name, 'no-ci')" - runs-on: [self-hosted, gpu] - steps: - - uses: actions/checkout@v4 - with: - submodules: 'recursive' - - name: Install dependencies - run: | - rm -f ~/.dace.conf - rm -rf .dacecache tests/.dacecache - python -m venv ~/.venv # create venv so we can use pip - source ~/.venv/bin/activate # activate venv - python -m pip install --upgrade pip - pip install flake8 pytest-xdist coverage - pip install mpi4py - pip install cupy - pip uninstall -y dace - pip install -e ".[testing]" - curl -Os https://uploader.codecov.io/latest/linux/codecov - chmod +x codecov - - - name: Test dependencies - run: | - source ~/.venv/bin/activate # activate venv - nvidia-smi - - - name: Run pytest GPU - run: | - source ~/.venv/bin/activate # activate venv - export DACE_cache=single - export PATH=$PATH:/usr/local/cuda/bin # some test is calling cuobjdump, so it needs to be in path - echo "CUDACXX: $CUDACXX" - pytest --cov-report=xml --cov=dace --tb=short --timeout_method thread --timeout=300 -m "gpu" - - - name: Run extra GPU tests - run: | - source ~/.venv/bin/activate # activate venv - export NOSTATUSBAR=1 - export DACE_cache=single - export COVERAGE_RCFILE=`pwd`/.coveragerc - export PYTHON_BINARY="coverage run --source=dace --parallel-mode" - ./tests/cuda_test.sh - - - name: Report overall coverage - run: | - source ~/.venv/bin/activate # activate venv - export COVERAGE_RCFILE=`pwd`/.coveragerc - coverage combine . */; coverage report; coverage xml - reachable=0 - ping -W 2 -c 1 codecov.io || reachable=$? - if [ $reachable -eq 0 ]; then - ./codecov - else - echo "Codecov.io is unreachable" - fi diff --git a/.github/workflows/hardware_test.yml b/.github/workflows/hardware_test.yml deleted file mode 100644 index 59dc201e4b..0000000000 --- a/.github/workflows/hardware_test.yml +++ /dev/null @@ -1,25 +0,0 @@ -name: DaCe RTL hardware emulation -on: workflow_dispatch -jobs: - test-rtl: - runs-on: [self-hosted, linux, xilinx-fpga] - steps: - - uses: actions/checkout@v4 - with: - submodules: 'recursive' - - name: Install dependencies - run: | - rm -f ~/.dace.conf - rm -rf .dacecache tests/.dacecache - . /opt/setupenv - python -m pip install --upgrade pip - pip install pytest-xdist flake8 - pip uninstall -y dace - pip install -e ".[testing]" - - - name: Run FPGA Tests - run: | - # Due to an internal bug in the Xilinx tools, where the current datetime is passed as an integer - # and overflowed in the year 2022, run the RTL FPGA tests pretending like it's January 1st 2021. - faketime -f "@2021-01-01 00:00:00" pytest -n auto --tb=short -m "rtl_hardware" - killall -9 xsim xsimk || true diff --git a/.github/workflows/heterogeneous-ci.yml b/.github/workflows/heterogeneous-ci.yml deleted file mode 100644 index 53a8788dce..0000000000 --- a/.github/workflows/heterogeneous-ci.yml +++ /dev/null @@ -1,96 +0,0 @@ -name: Heterogeneous Tests - -on: - push: - branches: [ main, ci-fix ] - pull_request: - branches: [ main, ci-fix ] - merge_group: - branches: [ main, ci-fix ] - -env: - CUDA_HOME: /usr/local/cuda - CUDACXX: nvcc - MKLROOT: /opt/intel/oneapi/mkl/latest/ - CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }} - -concurrency: - group: ${{github.workflow}}-${{github.ref}} - cancel-in-progress: true - -jobs: - test-heterogeneous: - if: "!contains(github.event.pull_request.labels.*.name, 'no-ci')" - runs-on: [self-hosted, linux] - steps: - - uses: actions/checkout@v4 - with: - submodules: 'recursive' - - name: Install dependencies - run: | - rm -f ~/.dace.conf - rm -rf .dacecache tests/.dacecache - python -m venv ~/.venv # create venv so we can use pip - source ~/.venv/bin/activate # activate venv - python -m pip install --upgrade pip - pip install flake8 pytest-xdist coverage - pip install mpi4py pytest-mpi - pip uninstall -y dace - pip install -e ".[testing]" - curl -Os https://uploader.codecov.io/latest/linux/codecov - chmod +x codecov - - - name: Test dependencies - run: | - papi_avail - - - name: Run parallel pytest - run: | - source ~/.venv/bin/activate # activate venv - export DACE_cache=unique - pytest --cov-report=xml --cov=dace --tb=short --timeout_method thread --timeout=300 -m "verilator or mkl or papi or datainstrument" - - - name: Run MPI tests - run: | - export NOSTATUSBAR=1 - export DACE_cache=single - export COVERAGE_RCFILE=`pwd`/.coveragerc - export PYTHON_BINARY="coverage run --source=dace --parallel-mode" - source ~/.venv/bin/activate # activate venv - ./tests/mpi_test.sh - - - - name: Test MPI with pytest - run: | - export NOSTATUSBAR=1 - export DACE_testing_serialization=1 - export DACE_testing_deserialize_exception=1 - export DACE_cache=unique - source ~/.venv/bin/activate # activate venv - mpirun -n 2 coverage run --source=dace --parallel-mode -m pytest -x --with-mpi --tb=short --timeout_method thread --timeout=300 -m "mpi" - - - name: Test ScaLAPACK PBLAS with pytest - run: | - export NOSTATUSBAR=1 - export DACE_testing_serialization=1 - export DACE_testing_deserialize_exception=1 - export DACE_cache=unique - export DACE_library_pblas_default_implementation=ReferenceOpenMPI - source ~/.venv/bin/activate # activate venv - for i in {1..4} - do - mpirun -n "$i" --oversubscribe coverage run --source=dace --parallel-mode -m pytest -x --with-mpi --tb=short --timeout_method thread --timeout=300 -m "scalapack" - done - - - name: Report overall coverage - run: | - export COVERAGE_RCFILE=`pwd`/.coveragerc - source ~/.venv/bin/activate # activate venv - coverage combine . */; coverage report; coverage xml - reachable=0 - ping -W 2 -c 1 codecov.io || reachable=$? - if [ $reachable -eq 0 ]; then - ./codecov - else - echo "Codecov.io is unreachable" - fi diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml deleted file mode 100644 index 899c0f2681..0000000000 --- a/.github/workflows/linting.yml +++ /dev/null @@ -1,36 +0,0 @@ -name: Code Quality - -on: - push: - branches: [ main, ci-fix ] - pull_request: - branches: [ main, ci-fix ] - -jobs: - linting: - if: "!contains(github.event.pull_request.labels.*.name, 'no-ci')" - name: pre-commit - runs-on: ubuntu-latest - - steps: - - name: Check repository - uses: actions/checkout@v4 - - - name: Setup Python 3.9 - uses: actions/setup-python@v5 - with: - python-version: '3.9' - cache: 'pip' - - - name: Install linting tools - run: pip install .[linting] - - - name: Run linting tools - id: lint - continue-on-error: true - run: pre-commit run --all-files - - - name: Show git diff - if: steps.lint.outcome == 'failure' - run: | - ./.github/workflows/scripts/show-git-diff.sh diff --git a/.github/workflows/pyFV3-ci.yml b/.github/workflows/pyFV3-ci.yml deleted file mode 100644 index 423a6c862f..0000000000 --- a/.github/workflows/pyFV3-ci.yml +++ /dev/null @@ -1,105 +0,0 @@ -name: NASA/NOAA pyFV3 repository build test - -# Temporarily disabled for main, and instead applied to a specific DaCe v1 maintenance branch (v1/maintenance). Once -# the FV3 bridge has been adapted to DaCe v1, this will need to be reverted back to apply to main. -on: - push: - #branches: [ main, ci-fix ] - branches: [ v1/maintenance, ci-fix ] - pull_request: - #branches: [ main, ci-fix ] - branches: [ v1/maintenance, ci-fix ] - merge_group: - #branches: [ main, ci-fix ] - branches: [ v1/maintenance, ci-fix ] - -defaults: - run: - shell: bash - -concurrency: - group: ${{github.workflow}}-${{github.ref}} - cancel-in-progress: true - -jobs: - build_and_validate_pyFV3: - if: "!contains(github.event.pull_request.labels.*.name, 'no-ci')" - runs-on: ubuntu-latest - strategy: - matrix: - python-version: [3.11.7] - - steps: - - uses: actions/checkout@v4 - with: - repository: 'NOAA-GFDL/PyFV3' - ref: 'ci/DaCe' - submodules: 'recursive' - path: 'pyFV3' - - uses: actions/checkout@v4 - with: - path: 'dace' - submodules: 'recursive' - - name: Set up Python ${{ matrix.python-version }} - uses: actions/setup-python@v5 - with: - python-version: ${{ matrix.python-version }} - - name: Install library dependencies - run: | - sudo apt-get update - sudo apt-get install -y libopenmpi-dev libboost-all-dev - gcc --version - # Because Github doesn't allow us to do a git checkout in code - # we use a trick to checkout DaCe first (not using the external submodule) - # install the full suite via requirements_dev, then re-install the correct DaCe - - name: Install Python packages - run: | - python -m pip install --upgrade pip wheel setuptools - pip install -e ./pyFV3[develop] - pip install -e ./dace - - name: Download data - run: | - cd pyFV3 - mkdir -p test_data - cd test_data - wget --retry-connrefused https://portal.nccs.nasa.gov/datashare/astg/smt/pace-regression-data/8.1.3_c12_6ranks_standard.D_SW.tar.gz - tar -xzvf 8.1.3_c12_6ranks_standard.D_SW.tar.gz - wget --retry-connrefused https://portal.nccs.nasa.gov/datashare/astg/smt/pace-regression-data/8.1.3_c12_6ranks_standard.RiemSolver3.tar.gz - tar -xzvf 8.1.3_c12_6ranks_standard.RiemSolver3.tar.gz - wget --retry-connrefused https://portal.nccs.nasa.gov/datashare/astg/smt/pace-regression-data/8.1.3_c12_6ranks_standard.Remapping.tar.gz - tar -xzvf 8.1.3_c12_6ranks_standard.Remapping.tar.gz - cd ../.. - # Clean up caches between run for stale un-expanded SDFG to trip the build system (NDSL side issue) - - name: "Regression test: Riemman Solver on D-grid (RiemSolver3)" - env: - FV3_DACEMODE: BuildAndRun - PACE_CONSTANTS: GFS - PACE_LOGLEVEL: Debug - run: | - pytest -v -s --data_path=./pyFV3/test_data/8.1.3/c12_6ranks_standard/dycore \ - --backend=dace:cpu --which_modules=Riem_Solver3 \ - --threshold_overrides_file=./pyFV3/tests/savepoint/translate/overrides/standard.yaml \ - ./pyFV3/tests/savepoint - rm -r ./.gt_cache_FV3_A - - name: "Regression test: Shallow water lagrangian dynamics on D-grid (D_SW) (on rank 0 only)" - env: - FV3_DACEMODE: BuildAndRun - PACE_CONSTANTS: GFS - PACE_LOGLEVEL: Debug - run: | - pytest -v -s --data_path=./pyFV3/test_data/8.1.3/c12_6ranks_standard/dycore \ - --backend=dace:cpu --which_modules=D_SW --which_rank=0 \ - --threshold_overrides_file=./pyFV3/tests/savepoint/translate/overrides/standard.yaml \ - ./pyFV3/tests/savepoint - rm -r ./.gt_cache_FV3_A - - name: "Regression test: Remapping (on rank 0 only)" - env: - FV3_DACEMODE: BuildAndRun - PACE_CONSTANTS: GFS - PACE_LOGLEVEL: Debug - run: | - pytest -v -s --data_path=./pyFV3/test_data/8.1.3/c12_6ranks_standard/dycore \ - --backend=dace:cpu --which_modules=Remapping --which_rank=0 \ - --threshold_overrides_file=./pyFV3/tests/savepoint/translate/overrides/standard.yaml \ - ./pyFV3/tests/savepoint - rm -r ./.gt_cache_FV3_A diff --git a/.github/workflows/release.sh b/.github/workflows/release.sh deleted file mode 100755 index 7b8ff5f5e4..0000000000 --- a/.github/workflows/release.sh +++ /dev/null @@ -1,18 +0,0 @@ -#!/bin/sh - -set -e - -# Install dependencies -pip install --upgrade twine - -# Synchronize submodules -git submodule update --init --recursive - -# Erase old distribution, if exists -rm -rf dist dace.egg-info - -# Make tarball -python -m build --sdist - -# Upload to PyPI -twine upload dist/* diff --git a/.github/workflows/scripts/show-git-diff.sh b/.github/workflows/scripts/show-git-diff.sh deleted file mode 100755 index a811c01672..0000000000 --- a/.github/workflows/scripts/show-git-diff.sh +++ /dev/null @@ -1,21 +0,0 @@ -#!/bin/bash - -# Check for uncommitted changes in the working tree -if [ -n "$(git status --porcelain)" ]; then - echo "Linting tools found the following changes are needed to comply" - echo "with our automatic styling." - echo "" - echo "Please run \"pre-commit run --all-files\" locally to fix these." - echo "See also https://github.com/spcl/dace/blob/main/CONTRIBUTING.md" - echo "" - echo "git status" - echo "----------" - git status - echo "" - echo "git diff" - echo "--------" - git --no-pager diff - echo "" - - exit 1 -fi diff --git a/.github/workflows/verilator_compatibility.yml b/.github/workflows/verilator_compatibility.yml deleted file mode 100644 index dce0c9b1fb..0000000000 --- a/.github/workflows/verilator_compatibility.yml +++ /dev/null @@ -1,37 +0,0 @@ -name: DaCe Verilator Compatibility Check -on: - workflow_dispatch: - inputs: - reason: - description: 'Reason for the trigger' - required: true - default: 'Check compatibility' - schedule: - - cron: '0 0 1 * *' # monthly -jobs: - build: - strategy: - matrix: - verilator_version: ['4.028', '4.034', '4.036', '4.100', 'master'] - runs-on: ubuntu-20.04 - steps: - - name: trigger reason - run: echo "Trigger Reason:" ${{ github.event.inputs.reason }} - - uses: actions/checkout@v4 - - name: checkout submodules - run: git submodule update --init --recursive - - name: install apt packages - run: sudo apt-get update && sudo apt-get -y install git make autoconf g++ flex bison libfl2 libfl-dev - - name: compile verilator - run: git clone https://github.com/verilator/verilator.git && cd verilator && git fetch origin && if [ ! "${{ matrix.verilator_version }}" == "master" ]; then git checkout v${{ matrix.verilator_version }}; fi && autoconf && ./configure && make -j2 && sudo make install - - uses: actions/setup-python@v5 - with: - python-version: '3.8' - architecture: 'x64' - - uses: BSFishy/pip-action@v1 - with: - packages: pytest - requirements: requirements.txt - - name: install dace - run: python3 -m pip install . - - run: pytest -m "verilator" From 5608c111f1083523b2d16a6efcda65b66aab8c2e Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Fri, 21 Nov 2025 14:17:03 +0100 Subject: [PATCH 08/14] Now let's hammer my fork. --- .github/workflows/dace-updater.yml | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/.github/workflows/dace-updater.yml b/.github/workflows/dace-updater.yml index 266e99fe73..61c0a08963 100644 --- a/.github/workflows/dace-updater.yml +++ b/.github/workflows/dace-updater.yml @@ -1,4 +1,4 @@ -name: Inforrm the Python package index about a new DaCe release +name: Inform the Python package index about a new DaCe release # Must be installed into the DaCe fork. on: @@ -11,20 +11,19 @@ on: # We need this until this file is not in `main`, without it the web interface will not pick it up. # See https://stackoverflow.com/a/71057825 - pull_request: + #pull_request: jobs: update-dace: runs-on: ubuntu-latest steps: - - name: Print all variables + - name: Inform Index shell: bash run: | - INDEX_ORGANIZATION="gridtools" - INDEX_REPO="python-pkg-index" - - # Only needed for installation - exit 0 + #INDEX_ORGANIZATION="gridtools" + #INDEX_REPO="python-pkg-index" + INDEX_ORGANIZATION="philip-paul-mueller" + INDEX_REPO="test_package_index" # We use `github.ref_name` here because we only run for a tag and they should be unique. # If we would use `github.ref` then we would have `refs/tags/`. An alternative From 9a28927cc7c3c53aba976c2cab5c0abeb4de0c53 Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Fri, 21 Nov 2025 14:18:34 +0100 Subject: [PATCH 09/14] Is it really this thing. --- .github/workflows/dace-updater.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/dace-updater.yml b/.github/workflows/dace-updater.yml index 61c0a08963..a32502f3c2 100644 --- a/.github/workflows/dace-updater.yml +++ b/.github/workflows/dace-updater.yml @@ -11,7 +11,7 @@ on: # We need this until this file is not in `main`, without it the web interface will not pick it up. # See https://stackoverflow.com/a/71057825 - #pull_request: + pull_request: jobs: update-dace: From 0188320a64b95bb27d28b41f42686ec24c0ca6ba Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Fri, 21 Nov 2025 14:21:04 +0100 Subject: [PATCH 10/14] Why is it not possible to start that thing manually. --- .github/workflows/dace-updater.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/dace-updater.yml b/.github/workflows/dace-updater.yml index a32502f3c2..4b11e6ad0b 100644 --- a/.github/workflows/dace-updater.yml +++ b/.github/workflows/dace-updater.yml @@ -2,12 +2,12 @@ name: Inform the Python package index about a new DaCe release # Must be installed into the DaCe fork. on: + workflow_dispatch: #push: # Only run once a new tag has been created. # TODO: Make sure that the tag is passed to the index update workflow. #tags: #- __gt4py-next-integration_* - workflow_dispatch: # We need this until this file is not in `main`, without it the web interface will not pick it up. # See https://stackoverflow.com/a/71057825 @@ -38,4 +38,4 @@ jobs: -H "Authorization: Bearer ${{ secrets.PKG_UPDATE_TOKEN }}" \ -H "X-GitHub-Api-Version: 2022-11-28" \ "https://api.github.com/repos/${INDEX_ORGANIZATION}/${INDEX_REPO}/dispatches" \ - -d '{"event_type":"update_package_index","client_payload":{"source_repo":"'"${SOURCE_REPO}"'","source_org":"'"${SOURCE_ORG}"'","dependency_ref":"'"${DEPENDENCY_REF}"'"}}' + -d '{"event_type":"update_package_index","client_payload":{"source_repo":"'"${SOURCE_REPO}"'","source_org":"'"${SOURCE_OWNER}"'","dependency_ref":"'"${DEPENDENCY_REF}"'"}}' From 634cee8f9a3a323a7b5673f064d4834b550c5ecc Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Fri, 21 Nov 2025 14:36:36 +0100 Subject: [PATCH 11/14] Switched to sha --- .github/workflows/dace-updater.yml | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/.github/workflows/dace-updater.yml b/.github/workflows/dace-updater.yml index 4b11e6ad0b..6f937ed9ef 100644 --- a/.github/workflows/dace-updater.yml +++ b/.github/workflows/dace-updater.yml @@ -2,7 +2,6 @@ name: Inform the Python package index about a new DaCe release # Must be installed into the DaCe fork. on: - workflow_dispatch: #push: # Only run once a new tag has been created. # TODO: Make sure that the tag is passed to the index update workflow. @@ -13,6 +12,9 @@ on: # See https://stackoverflow.com/a/71057825 pull_request: + # For some reasons this does not work, so it can not be triggered manually. + workflow_dispatch: + jobs: update-dace: runs-on: ubuntu-latest @@ -25,10 +27,10 @@ jobs: INDEX_ORGANIZATION="philip-paul-mueller" INDEX_REPO="test_package_index" - # We use `github.ref_name` here because we only run for a tag and they should be unique. - # If we would use `github.ref` then we would have `refs/tags/`. An alternative - # would also be `github.sha`. - DEPENDENCY_REF="${{ github.ref_name }}" + # We are using `github.sha` here to be sure that we transmit an identifier to the index + # that can be checked out. Before we used `github.ref_name` but got strange results + # with it. + DEPENDENCY_REF="${{ github.sha }}" SOURCE_REPO="dace" SOURCE_OWNER="gridtools" From 4cd598781e45cb870be5b388c854405307951c49 Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Mon, 1 Dec 2025 07:43:28 +0100 Subject: [PATCH 12/14] Empty commit to see if it triggers. From c8b2da2b520219478812cb059a1b10ddcee4c8e0 Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Mon, 1 Dec 2025 07:45:45 +0100 Subject: [PATCH 13/14] Disable. --- .github/workflows/dace-updater.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/dace-updater.yml b/.github/workflows/dace-updater.yml index 6f937ed9ef..ce3644725a 100644 --- a/.github/workflows/dace-updater.yml +++ b/.github/workflows/dace-updater.yml @@ -10,10 +10,10 @@ on: # We need this until this file is not in `main`, without it the web interface will not pick it up. # See https://stackoverflow.com/a/71057825 - pull_request: + #pull_request: # For some reasons this does not work, so it can not be triggered manually. - workflow_dispatch: + #workflow_dispatch: jobs: update-dace: From cb80bad2aa8c63163fc942ecc82a2d2d0228270d Mon Sep 17 00:00:00 2001 From: Philip Mueller Date: Mon, 1 Dec 2025 07:45:54 +0100 Subject: [PATCH 14/14] Deleted the workflow file. --- .github/workflows/dace-updater.yml | 43 ------------------------------ 1 file changed, 43 deletions(-) delete mode 100644 .github/workflows/dace-updater.yml diff --git a/.github/workflows/dace-updater.yml b/.github/workflows/dace-updater.yml deleted file mode 100644 index ce3644725a..0000000000 --- a/.github/workflows/dace-updater.yml +++ /dev/null @@ -1,43 +0,0 @@ -name: Inform the Python package index about a new DaCe release -# Must be installed into the DaCe fork. - -on: - #push: - # Only run once a new tag has been created. - # TODO: Make sure that the tag is passed to the index update workflow. - #tags: - #- __gt4py-next-integration_* - - # We need this until this file is not in `main`, without it the web interface will not pick it up. - # See https://stackoverflow.com/a/71057825 - #pull_request: - - # For some reasons this does not work, so it can not be triggered manually. - #workflow_dispatch: - -jobs: - update-dace: - runs-on: ubuntu-latest - steps: - - name: Inform Index - shell: bash - run: | - #INDEX_ORGANIZATION="gridtools" - #INDEX_REPO="python-pkg-index" - INDEX_ORGANIZATION="philip-paul-mueller" - INDEX_REPO="test_package_index" - - # We are using `github.sha` here to be sure that we transmit an identifier to the index - # that can be checked out. Before we used `github.ref_name` but got strange results - # with it. - DEPENDENCY_REF="${{ github.sha }}" - SOURCE_REPO="dace" - SOURCE_OWNER="gridtools" - - curl -L -v \ - -X POST \ - -H "Accept: application/vnd.github+json" \ - -H "Authorization: Bearer ${{ secrets.PKG_UPDATE_TOKEN }}" \ - -H "X-GitHub-Api-Version: 2022-11-28" \ - "https://api.github.com/repos/${INDEX_ORGANIZATION}/${INDEX_REPO}/dispatches" \ - -d '{"event_type":"update_package_index","client_payload":{"source_repo":"'"${SOURCE_REPO}"'","source_org":"'"${SOURCE_OWNER}"'","dependency_ref":"'"${DEPENDENCY_REF}"'"}}'