[Claude] Add /torch_bisect skill for bisecting PyTorch regressions#2438
[Claude] Add /torch_bisect skill for bisecting PyTorch regressions#2438fegin wants to merge 4 commits intogh/fegin/83/basefrom
Conversation
.claude/skills/torch_bisect/SKILL.md
Outdated
| directory that reproduces the issue. Tell the user: | ||
| > "This command must **exit 0 on a good PyTorch commit** and **exit non-zero | ||
| > on a bad commit**. I will use only the exit code to judge good vs bad — I | ||
| > will NOT analyze log output to determine the result." |
There was a problem hiding this comment.
wondering if you're being too rigid here. i would probably trust claude to analyze log output for me. that could help if the issue is a loss degradation or something that requires a bit of judgement. but this is also OK as a starting point.
There was a problem hiding this comment.
The answer is yes. I'm being rigid. The reason is that I saw Claude once tried to fix PyTorch installation issue when I asked it to run some tests. Claude "fix" PyTorch and the test can be run. But in reality, Claude just used some incorrect way to fix PyTorch and the test signal I got was completely incorrect.
For bisect, we have a clear signal either pass or not pass. I bet Claude can help to parse the log but I'm also afraid that Claude may incorrectly choose the direction and accidentally mark the commit to be good or bad.
My opinion is that the test should have good signal, either pass or not pass. Even for loss degradation, we have command that we can ask Claude to use.
.claude/skills/torch_bisect/SKILL.md
Outdated
| First, record the TorchTitan directory by running `pwd`. Store this absolute | ||
| path — you will need it throughout the bisect to switch back. |
There was a problem hiding this comment.
This is surprising, did you have cases where claude got lost? The latest claude code should be smart enough to spin up subtasks (separated context) to do the build process, and that should keep the main claude's context very available
There was a problem hiding this comment.
No, I didn't ask Claude to do this. Claude added the absolute path requirement but I think it is okay. So I didn't delete it but we can remove this. I don't think it will go wrong.
There was a problem hiding this comment.
looks useful! but also tbh, looks more detailed than i would have thought it needs to be. I think if i just tell claude very vaguely- bisect pytorch for me using this torchtitan test command and this pytorch build command, it will probably do OK. does it need so much exact specification? (just curious, really)
.claude/skills/torch_bisect/SKILL.md
Outdated
| **If the command times out**: follow the user's chosen timeout policy from | ||
| Phase 1 — either automatically mark as bad, or ask the user what to do. | ||
|
|
||
| Do NOT analyze the test output to determine pass/fail. Use ONLY the exit code. |
There was a problem hiding this comment.
i'm not sure this is desired, there might be transient failures causing the bisect to abort early. we could ask the subagent to match the error exactly
There was a problem hiding this comment.
See my answer to @wconstab above. I have seen Claude incorrectly fix a PyTorch installation issue. My opinion is that we should have clear test signal if we want to do bisect. Otherwise, it is possible Claude may steer to an incorrect direction. I trust it can find many cases most of the time but not always.
| /test_*.py | ||
| /debug_*.py |
There was a problem hiding this comment.
I also feel they are somewhat random and seemingly tied to this skill.
There was a problem hiding this comment.
No, I copied from PyTorch, lol.
I can explain some files, when you ask Claude to try some features, it will create some test_*.py to understand the behavior. I think the rules in PyTorch just want to ensure that these files are not accidentally committed. I do see the 4.6 model always deleted the files but I sometimes ask Claude to preserve so that I can read as well.
I don't mind to remove these.
I asked Claude to make it more concise about the writing which reduces the file to less than 100 LoC. I'll update it later. But I think you are asking that if we need to give so many instructions. The short answer is we don't need that many instructions, less detailed instructions will work most of time too (I didn't test it but I believe it would work). The reason why I want to be rigid and give Claude more instructions is that I treat this skill as a tool rather than some skills that I'll use for development. So I want a deterministic result and a lower chance to get false positive results. I don't need it to be creative. We have clear test signals of failure or success, at least for now. For example, I added the output format because I found I may need to have 1-2 more conversations as it sometime only show me PR number. Those "not analyze the log" is also by design. I currently don't see any benefit of letting Claude analyze the log, at least from my previous bisect experience using Claude. This may change in the future. This is just my own opinion. We can change the instructions to fit most people's working styles. |
Try it with /torch_bisect for bisecting a problem when PyTorch daily build break TorchTitan. **Summary** Add a new Claude Code skill (**/torch_bisect**) that automates git bisect against the PyTorch repo to identify commits that introduce regressions breaking TorchTitan. The skill guides the user through four phases: gathering inputs (PyTorch path, build/test commands, known-good commit), setting up the bisect session with connectivity checks, running the build-test-record loop, and producing a summary report with the offending commit and associated PR details. Also updates .gitignore to selectively ignore Claude Code local files. The .gitignore follows `torch/.gitignore`. ghstack-source-id: 142f773 Pull-Request: #2438
Stack from ghstack (oldest at bottom):
Try it with /torch_bisect for bisecting a problem when PyTorch daily build break TorchTitan.
Summary
Add a new Claude Code skill (/torch_bisect) that automates git bisect against the PyTorch repo to identify commits that introduce regressions breaking TorchTitan.
The skill guides the user through four phases: gathering inputs (PyTorch path, build/test commands,
known-good commit), setting up the bisect session with connectivity checks, running the build-test-record loop, and producing a summary report with the offending commit and associated PR details.
Also updates .gitignore to selectively ignore Claude Code local files. The .gitignore follows
torch/.gitignore.