This repo documents key Slurm's concepts and how they map to dstack's equvivalents. Once the mapping is finalized, this documentation will be integrated into the dstack docs.
Migration Guide: Slurm to dstack - A comprehensive migration guide covering architectural similarities, key differences, and practical examples demonstrating feature implementations in both systems.
For in-depth coverage of specific topics, see these additional detailed guides:
- 1 Control plane, state, and scaling
- 2 Queueing, prioritization, and scheduling mechanics
- 3 Resource model, generic resources (TRES/GRES), and enforcement
- 4 Job submission, allocation, and execution
- 5 Accounts, QOS, and accounting pipeline
- 6 Job arrays & dependencies
- 7 Cluster node management, health, and lifecycle
- 8 Filesystems and data access
- 9 Fault tolerance, checkpointing, and job recovery
- 10 GPU health monitoring and device constraints
- 11 Partition design, node grouping, and queue layout
- 12 Authentication and security
- 13 Reservations
- 14 Monitoring and observability
- 15 Kubernetes integration
We welcome feedback on the Slurm documentation and mappings. You can contribute in the following ways:
- Create an issue: Open a GitHub issue to report errors, suggest improvements, or ask questions about the documentation
- Submit a PR: Submit a pull request with corrections, additions, or improvements to the documentation
- Contact directly: Reach out to @peterschmidt85 on GitHub for direct feedback or questions
guide.md- Main migration guide with practical examplesconcepts/- Detailed documentation mapping Slurm concepts to theirdstackequivalents