Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 0 additions & 65 deletions docs/blog/archive/ambassador-program.md

This file was deleted.

173 changes: 0 additions & 173 deletions docs/blog/archive/efa.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/blog/posts/changelog-07-25.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ resources:

#### AWS EFA

EFA is a network interface for EC2 that enables low-latency, high-bandwidth communication between nodes—crucial for scaling distributed deep learning. With `dstack`, EFA is automatically enabled when using supported instance types in fleets. Check out our [example](../../examples/clusters/efa/index.md)
EFA is a network interface for EC2 that enables low-latency, high-bandwidth communication between nodes—crucial for scaling distributed deep learning. With `dstack`, EFA is automatically enabled when using supported instance types in fleets. Check out our [example](../../examples/clusters/aws/index.md)

#### Default Docker images

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/concepts/fleets.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ This ensures all instances are provisioned with optimal inter-node connectivity.
Note, EFA requires the `public_ips` to be set to `false` in the `aws` backend configuration.
Otherwise, instances are only connected by the default VPC subnet.

Refer to the [EFA](../../examples/clusters/efa/index.md) example for more details.
Refer to the [AWS](../../examples/clusters/aws/index.md) example for more details.

??? info "GCP"
When you create a fleet with GCP, `dstack` automatically configures [GPUDirect-TCPXO and GPUDirect-TCPX](https://cloud.google.com/kubernetes-engine/docs/how-to/gpu-bandwidth-gpudirect-tcpx-autopilot) networking for the A3 Mega and A3 High instance types, as well as RoCE networking for the A4 instance type.
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/guides/clusters.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ For cloud fleets, fast interconnect is currently supported only on the `aws`, `g

!!! info "Backend configuration"
Note, EFA requires the `public_ips` to be set to `false` in the `aws` backend configuration.
Refer to the [EFA](../../examples/clusters/efa/index.md) example for more details.
Refer to the [AWS](../../examples/clusters/aws/index.md) example for more details.

=== "GCP"
When you create a cloud fleet with GCP, `dstack` automatically configures [GPUDirect-TCPXO and GPUDirect-TCPX](https://cloud.google.com/kubernetes-engine/docs/how-to/gpu-bandwidth-gpudirect-tcpx-autopilot) networking for the A3 Mega and A3 High instance types, as well as RoCE networking for the A4 instance type.
Expand Down
2 changes: 1 addition & 1 deletion docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ hide:
<a href="/examples/clusters/efa"
class="feature-cell sky">
<h3>
AWS EFA
AWS
</h3>

<p>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# AWS EFA
# AWS

In this guide, we’ll walk through how to run high-performance distributed training on AWS using [Amazon Elastic Fabric Adapter (EFA)](https://aws.amazon.com/hpc/efa/) with `dstack`.

Expand Down Expand Up @@ -37,11 +37,11 @@ projects:

Once your backend is ready, define a fleet configuration.

<div editor-title="examples/clusters/efa/fleet.dstack.yml">
<div editor-title="examples/clusters/aws/efa-fleet.dstack.yml">

```yaml
type: fleet
name: my-efa-fleet
name: efa-fleet

nodes: 2
placement: cluster
Expand All @@ -57,14 +57,14 @@ Provision the fleet with `dstack apply`:
<div class="termy">

```shell
$ dstack apply -f examples/clusters/efa/fleet.dstack.yml
$ dstack apply -f examples/clusters/aws/efa-fleet.dstack.yml

Provisioning...
---> 100%

FLEET INSTANCE BACKEND INSTANCE TYPE GPU PRICE STATUS CREATED
my-efa-fleet 0 aws (us-west-2) p4d.24xlarge H100:8:80GB $98.32 idle 3 mins ago
1 aws (us-west-2) p4d.24xlarge H100:8:80GB $98.32 idle 3 mins ago
FLEET INSTANCE BACKEND INSTANCE TYPE GPU PRICE STATUS CREATED
efa-fleet 0 aws (us-west-2) p4d.24xlarge H100:8:80GB $98.32 idle 3 mins ago
1 aws (us-west-2) p4d.24xlarge H100:8:80GB $98.32 idle 3 mins ago
```

</div>
Expand All @@ -76,7 +76,7 @@ Provisioning...

```yaml
type: fleet
name: my-efa-fleet
name: efa-fleet

nodes: 2
placement: cluster
Expand Down
8 changes: 4 additions & 4 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -145,11 +145,10 @@ plugins:
'docs/examples/deployment/tgi/index.md': 'examples/inference/tgi/index.md'
'providers.md': 'partners.md'
'backends.md': 'partners.md'
'blog/ambassador-program.md': 'blog/archive/ambassador-program.md'
'blog/monitoring-gpu-usage.md': 'blog/posts/dstack-metrics.md'
'blog/inactive-dev-environments-auto-shutdown.md': 'blog/posts/inactivity-duration.md'
'blog/data-centers-and-private-clouds.md': 'blog/posts/gpu-blocks-and-proxy-jump.md'
'blog/distributed-training-with-aws-efa.md': 'examples/clusters/efa/index.md'
'blog/distributed-training-with-aws-efa.md': 'examples/clusters/aws/index.md'
'blog/dstack-stats.md': 'blog/posts/dstack-metrics.md'
'docs/concepts/metrics.md': 'docs/guides/metrics.md'
'docs/guides/monitoring.md': 'docs/guides/metrics.md'
Expand All @@ -166,11 +165,12 @@ plugins:
'examples/deployment/trtllm/index.md': 'examples/inference/trtllm/index.md'
'examples/fine-tuning/trl/index.md': 'examples/single-node-training/trl/index.md'
'examples/fine-tuning/axolotl/index.md': 'examples/single-node-training/axolotl/index.md'
'blog/efa.md': 'examples/clusters/efa/index.md'
'blog/efa.md': 'examples/clusters/aws/index.md'
'docs/concepts/repos.md': 'docs/concepts/dev-environments.md#repos'
'examples/clusters/a3high/index.md': 'examples/clusters/gcp/index.md'
'examples/clusters/a3mega/index.md': 'examples/clusters/gcp/index.md'
'examples/clusters/a4/index.md': 'examples/clusters/gcp/index.md'
'examples/clusters/efa/index.md': 'examples/clusters/aws/index.md'
- typeset
- gen-files:
scripts: # always relative to mkdocs.yml
Expand Down Expand Up @@ -326,7 +326,7 @@ nav:
- NCCL tests: examples/clusters/nccl-tests/index.md
- RCCL tests: examples/clusters/rccl-tests/index.md
- GCP: examples/clusters/gcp/index.md
- AWS EFA: examples/clusters/efa/index.md
- AWS: examples/clusters/aws/index.md
- Crusoe: examples/clusters/crusoe/index.md
- Inference:
- SGLang: examples/inference/sglang/index.md
Expand Down