From f4c864b3b6b3903a5542a838fee003309963d951 Mon Sep 17 00:00:00 2001 From: chaosisnotopen Date: Sat, 13 Dec 2025 21:56:42 +0800 Subject: [PATCH 1/3] Ascend NPU support DeepXTrace 1.Ascend NPU support for DeepXTrace with MOE dispatch/combine metrics probing. 2.Link to the related Ascend MOE operations pull request and an external case study article. --- README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 5ef2a8d..c554383 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,10 @@ # DeepXTrace -DeepXTrace is a lightweight system tool designed to efficiently and precisely locate slow ranks in DeepEP-based environments by enhancing the [DeepEP](https://github.com/deepseek-ai/DeepEP) communication library. It is composed of two core components: *DeepEP Metrics Probe* and *DeepXTrace Metrics Analysis*. +DeepXTrace is a lightweight diagnostic tool designed to efficiently and precisely locate slow ranks in MoE-based distributed environments through instrumentation of communication libraries(e.g., [DeepEP for GPU](https://github.com/deepseek-ai/DeepEP),[MC2 for NPU](https://gitcode.com/cann/ops-transformer)). It is composed of two core components: *MoE COMM Metrics Probe* and *DeepXTrace Metrics Analysis*. DeepXTrace supports diagnosis of various slowdown scenarios, including: -* *Comp-Slow*: Slowdown caused by the destination rank(e.g., GPU/CPU compute latency). +* *Comp-Slow*: Slowdown caused by the destination rank(e.g., xPU compute latency). * *Mixed-Slow*: Slowdown caused by the source rank(e.g., uneven expert distribution or hotspot congestion). * *Comm-Slow*: Slowdown caused by the communication path between specific source and destination ranks(e.g., communication link issues). @@ -21,13 +21,15 @@ The following figure shows the latency matrix for the Combine operator's token r ![combine](figures/combine.png) -## DeepEP-Metrics-Probe +## MoE-COMM-Metrics-Probe -A low-overhead module for measuring critical diagnostic indicators during DeepEP communication. See also: [DeepEP Diagnose PR](https://github.com/deepseek-ai/DeepEP/pull/311). +A low-overhead module for measuring critical diagnostic indicators during MoE communication. Supported Implementations: + - **DeepEP (GPU)**: Integrated metrics probe via [DeepEP Diagnose PR #311](https://github.com/deepseek-ai/DeepEP/pull/311) + - **MC2 (NPU)**: Native instrumentation through [MC2 Diagnose PR #288](https://gitcode.com/cann/ops-transformer/pull/288). See also [Ascend and DeepXTrace Blog](https://mp.weixin.qq.com/s/AaZ3pgM-brWw8-DMxS54Wg) ## DeepXTrace-Metrics-Analysis -An analysis module that locates the slow rank issues by processing the collected metrics. +A cross-platform analysis module that identifies slow-rank bottlenecks across GPU/NPU clusters through metric processing. ### Build ```shell From 8b8ad5eb2b740bf38b15ea7673db5d924f3f3b2d Mon Sep 17 00:00:00 2001 From: sky Date: Sat, 13 Dec 2025 22:15:45 +0800 Subject: [PATCH 2/3] Update README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c554383..af3a5b7 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # DeepXTrace -DeepXTrace is a lightweight diagnostic tool designed to efficiently and precisely locate slow ranks in MoE-based distributed environments through instrumentation of communication libraries(e.g., [DeepEP for GPU](https://github.com/deepseek-ai/DeepEP),[MC2 for NPU](https://gitcode.com/cann/ops-transformer)). It is composed of two core components: *MoE COMM Metrics Probe* and *DeepXTrace Metrics Analysis*. +DeepXTrace is a lightweight diagnostic tool designed to efficiently and precisely locate slow ranks in MoE-based distributed environments through instrumentation of communication libraries (e.g., [DeepEP for GPU](https://github.com/deepseek-ai/DeepEP), [MC2 for NPU](https://gitcode.com/cann/ops-transformer)). It is composed of two core components: *MoE COMM Metrics Probe* and *DeepXTrace Metrics Analysis*. DeepXTrace supports diagnosis of various slowdown scenarios, including: From 861065bfe86a81ad997e4b797d87d7b81d75f863 Mon Sep 17 00:00:00 2001 From: sky Date: Sat, 13 Dec 2025 22:16:13 +0800 Subject: [PATCH 3/3] Update README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index af3a5b7..8d9789c 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ DeepXTrace is a lightweight diagnostic tool designed to efficiently and precisel DeepXTrace supports diagnosis of various slowdown scenarios, including: -* *Comp-Slow*: Slowdown caused by the destination rank(e.g., xPU compute latency). +* *Comp-Slow*: Slowdown caused by the destination rank (e.g., xPU compute latency). * *Mixed-Slow*: Slowdown caused by the source rank(e.g., uneven expert distribution or hotspot congestion). * *Comm-Slow*: Slowdown caused by the communication path between specific source and destination ranks(e.g., communication link issues).