diff --git a/docs/docs/guides/troubleshooting.md b/docs/docs/guides/troubleshooting.md index 9ece2b4ffb..5d17b894d0 100644 --- a/docs/docs/guides/troubleshooting.md +++ b/docs/docs/guides/troubleshooting.md @@ -28,6 +28,20 @@ and [this](https://github.com/dstackai/dstack/issues/1551). ## Typical issues +### No fleets { #no-fleets } +[//]: # (NOTE: This section is referenced in the CLI. Do not change its URL.) + +If you run `dstack apply` and see `No fleets` status it can mean two things: + +=== "The project has no fleets" + In this case, ensure you've created one before submitting runs. This can be either a [backend fleet](../concepts/fleets.md#backend-fleets) (if you are using cloud or Kubernetes) or an [SSH fleet](../concepts/fleets.md#ssh-fleets) (if you're using on-prem clusters). + + !!! info "Backend fleets" + Note that creating [backend fleet](../concepts/fleets.md#backend-fleets) doesn't necessarily require provisioning instances upfront. If you set `nodes` to a range, `dstack` will be able to provision instances as required. See [backend fleet](../concepts/fleets.md#backend-fleets) for examples. + +=== "No matching fleet found" + This means fleets exist but run requirements do not match the configuration of the fleet. Review your fleets, and ensure that both run and fleet configuration are correct. + ### No offers { #no-offers } [//]: # (NOTE: This section is referenced in the CLI. Do not change its URL.) @@ -37,11 +51,7 @@ Below are some of the reasons why this might happen. > Feel free to use `dstack offer` to view available offers. -#### Cause 1: No fleets - -Make sure you've created a [fleet](../concepts/fleets.md) before submitting any runs. - -#### Cause 2: No backends +#### Cause 1: No backends If you are not using [SSH fleets](../concepts/fleets.md#ssh-fleets), make sure you have configured at least one [backends](../concepts/backends.md). @@ -49,7 +59,7 @@ If you have configured a backend but still cannot use it, check the output of `d > You can find a list of successfully configured backends on the [project settings page](../concepts/projects.md#backends) in the UI. -#### Cause 3: Requirements mismatch +#### Cause 2: Requirements mismatch When you apply a configuration, `dstack` tries to find instances that match the [`resources`](../reference/dstack.yml/task.md#resources), @@ -66,7 +76,7 @@ Make sure your configuration doesn't set any conflicting requirements, such as `regions` that don't exist in the specified `backends`, or `instance_types` that don't match the specified `resources`. -#### Cause 4: Too specific resources +#### Cause 3: Too specific resources If you set a resource requirement to an exact value, `dstack` will only select instances that have exactly that amount of resources. For example, `cpu: 5` and `memory: 10GB` will only @@ -76,14 +86,14 @@ Typically, you will want to set resource ranges to match more instances. For example, `cpu: 4..8` and `memory: 10GB..` will match instances with 4 to 8 CPUs and at least 10GB of memory. -#### Cause 5: Default resources +#### Cause 4: Default resources By default, `dstack` uses these resource requirements: `cpu: 2..`, `memory: 8GB..`, `disk: 100GB..`. If you want to use smaller instances, override the `cpu`, `memory`, or `disk` properties in your configuration. -#### Cause 6: GPU requirements +#### Cause 5: GPU requirements By default, `dstack` only selects instances with no GPUs or a single NVIDIA GPU. If you want to use non-NVIDIA GPUs or multi-GPU instances, set the `gpu` property @@ -94,13 +104,13 @@ Examples: `gpu: amd` (one AMD GPU), `gpu: A10:4..8` (4 to 8 A10 GPUs), > If you don't specify the number of GPUs, `dstack` will only select single-GPU instances. -#### Cause 7: Network volumes +#### Cause 6: Network volumes If your run configuration uses [network volumes](../concepts/volumes.md#network-volumes), `dstack` will only select instances from the same backend and region as the volumes. For AWS, the availability zone of the volume and the instance should also match. -#### Cause 8: Feature support +#### Cause 7: Feature support Some `dstack` features are not supported by all backends. If your configuration uses one of these features, `dstack` will only select offers from the backends that support it. @@ -116,7 +126,7 @@ one of these features, `dstack` will only select offers from the backends that s - [Reservations](../reference/dstack.yml/fleet.md#reservation) are only supported by the `aws` and `gcp` backends. -#### Cause 9: dstack Sky balance +#### Cause 8: dstack Sky balance If you are using [dstack Sky](https://sky.dstack.ai), diff --git a/src/dstack/_internal/cli/utils/common.py b/src/dstack/_internal/cli/utils/common.py index 1716cabd2b..c5b185a4b1 100644 --- a/src/dstack/_internal/cli/utils/common.py +++ b/src/dstack/_internal/cli/utils/common.py @@ -33,13 +33,13 @@ NO_OFFERS_WARNING = ( "[warning]" "No matching instance offers available. Possible reasons:" - " https://dstack.ai/docs/guides/troubleshooting/#no-offers" + " [link]https://dstack.ai/docs/guides/troubleshooting/#no-offers[/link]" "[/]\n" ) NO_FLEETS_WARNING = ( - "[warning]" - "The project has no fleets. Create one before submitting a run:" - " https://dstack.ai/docs/concepts/fleets" + "[error]" + "The project has no fleets. Create one before submitting a run.\n" + "See [link]https://dstack.ai/docs/guides/troubleshooting/#no-fleets[/link]" "[/]\n" ) diff --git a/src/dstack/_internal/server/background/tasks/process_submitted_jobs.py b/src/dstack/_internal/server/background/tasks/process_submitted_jobs.py index 4ddd6a13d7..d1d86c41aa 100644 --- a/src/dstack/_internal/server/background/tasks/process_submitted_jobs.py +++ b/src/dstack/_internal/server/background/tasks/process_submitted_jobs.py @@ -351,8 +351,8 @@ async def _process_submitted_job( ) # Note: `_get_job_status_message` relies on the "No fleet found" substring to return "no fleets" job_model.termination_reason_message = ( - "No fleet found. Create it before submitting a run: " - "https://dstack.ai/docs/concepts/fleets" + "No matching fleet found. Possible reasons: " + "https://dstack.ai/docs/guides/troubleshooting/#no-fleets" ) switch_job_status(session, job_model, JobStatus.TERMINATING) job_model.last_processed_at = common_utils.get_current_datetime() diff --git a/src/dstack/_internal/server/services/jobs/__init__.py b/src/dstack/_internal/server/services/jobs/__init__.py index 68fea166c1..b86bd09643 100644 --- a/src/dstack/_internal/server/services/jobs/__init__.py +++ b/src/dstack/_internal/server/services/jobs/__init__.py @@ -806,7 +806,7 @@ def _get_job_status_message(job_model: JobModel) -> str: ): if ( job_model.termination_reason_message - and "No fleet found" in job_model.termination_reason_message + and "No matching fleet found" in job_model.termination_reason_message ): return "no fleets" return "no offers" diff --git a/src/tests/_internal/cli/utils/test_run.py b/src/tests/_internal/cli/utils/test_run.py index 20f37a820b..3ed665d932 100644 --- a/src/tests/_internal/cli/utils/test_run.py +++ b/src/tests/_internal/cli/utils/test_run.py @@ -261,7 +261,7 @@ async def test_simple_run(self, session: AsyncSession): JobStatus.FAILED, JobTerminationReason.FAILED_TO_START_DUE_TO_NO_CAPACITY, None, - "No fleet found. Create it before submitting a run: https://dstack.ai/docs/concepts/fleets", + "No matching fleet found. Possible reasons: https://dstack.ai/docs/guides/troubleshooting/#no-fleets", "no fleets", "indian_red1", ),