miles-code-angel · zhaochenyang20 · Jan 13, 2026 · Jan 13, 2026
diff --git a/docs/en/advanced/pd-disaggregation.md b/docs/en/advanced/pd-disaggregation.md
@@ -1,5 +1,5 @@
 # PD Disaggregation
 
-Miles supports Prefill and Decode disaggregation (PD Disaggregation).
+miles supports Prefill and Decode disaggregation (PD Disaggregation).
 
 You can set the number of servers used for Prefill by setting the `--prefill-num-servers` argument.
diff --git a/docs/en/examples/glm4-9B.md b/docs/en/examples/glm4-9B.md
@@ -8,7 +8,7 @@ After pulling the `radixark/miles:latest` image, initialize the image environmen
 cd /root/
 git clone https://github.com/radixark/miles.git
 cd miles/
-pip install -e .
+pip install -e . --no-deps
 ```
 
 Download the model and data:

diff --git a/docs/en/examples/qwen3-30B-A3B.md b/docs/en/examples/qwen3-30B-A3B.md
@@ -9,7 +9,7 @@ To convert huggingface checkpoint to torch_dist, please try:
 
 ```bash
 cd miles/
-pip install -e .
+pip install -e . --no-deps
 source scripts/models/qwen3-30B-A3B.sh
 PYTHONPATH=/root/Megatron-LM/ torchrun --nproc-per-node 8 \
    tools/convert_hf_to_torch_dist.py \

diff --git a/docs/en/examples/qwen3-4B.md b/docs/en/examples/qwen3-4B.md
@@ -8,7 +8,7 @@ After pulling the `radixark/miles:latest` image, initialize the image environmen
 cd /root/
 git clone https://github.com/radixark/miles.git
 cd miles/
-pip install -e .
+pip install -e . --no-deps
 ```
 
 Download the model and data:

diff --git a/docs/en/get_started/quick_start.md b/docs/en/get_started/quick_start.md
@@ -45,7 +45,7 @@ miles is already installed in the docker image. To update to the latest verison,
 # Path can be adjusted according to actual situation
 cd /root/miles
 git pull
-pip install -e .
+pip install -e . --no-deps
 ```
 
 ## Model and Dataset Download

diff --git a/docs/en/get_started/usage.md b/docs/en/get_started/usage.md
@@ -187,7 +187,7 @@ Additionally, we provide a `metadata_key`, which defaults to `"metadata"`. When
     - `reinforce_plus_plus` and `reinforce_plus_plus_baseline` ([https://arxiv.org/abs/2501.03262](https://arxiv.org/abs/2501.03262))
     - `ppo` ([https://arxiv.org/abs/1707.06347](https://arxiv.org/abs/1707.06347))
     - `on_policy_distillation`
-- `--calculate-per-token-loss`: By default, Miles calculates loss on a per-sample basis, i.e., `mean(sum(sample_i) / len(sample_i))`. Enable this flag to calculate loss on a per-token basis, i.e., `sum(sum(sample_i)) / sum(len(sample_i))`.
+- `--calculate-per-token-loss`: By default, miles calculates loss on a per-sample basis, i.e., `mean(sum(sample_i) / len(sample_i))`. Enable this flag to calculate loss on a per-token basis, i.e., `sum(sum(sample_i)) / sum(len(sample_i))`.
 - `--use-tis`: Enable this setting to use TIS (Truncated Importance Sampling) (https://fengyao.notion.site/off-policy-rl).
 - `--true-on-policy-mode`: Enable True On-Policy mode, which strictly ensures that data is generated by the current policy during training.
 
@@ -374,7 +374,7 @@ hf download --repo-type dataset zhuzilin/aime-2024 \
 # Clone code and install dependencies
 git clone https://github.com/radixark/miles.git
 cd miles
-pip install -e .
+pip install -e . --no-deps
 
 
 # FSDP does not require weight conversion, natively supports huggingface format

diff --git a/docs/en/platform_support/amd_tutorial.md b/docs/en/platform_support/amd_tutorial.md
@@ -54,7 +54,7 @@ Then, download and install miles:
 ```bash
 git clone https://github.com/radixark/miles.git
 cd miles
-pip install -e .
+pip install -e . --no-deps
 ```
 
 Download the model and data:
@@ -93,7 +93,7 @@ PYTHONPATH=${MEGATRON_LM_PATH} python tools/convert_hf_to_torch_dist.py \
 
 Note: We implemented a dedicated AMD conversion script that forces a CPU-only conversion workflow using the Gloo backend to bypass hardware-specific issues. A GPU-based script for ROCm is currently in development.
 
-⚠️ If you encounter an issue where miles cannot be found, please run `pip install -e .` in the miles directory.
+⚠️ If you encounter an issue where miles cannot be found, please run `pip install -e . --no-deps` in the miles directory.
 
 
 ### Example: Qwen3-4B

diff --git a/examples/README.md b/examples/README.md
@@ -1,6 +1,6 @@
 # Examples
 
-These examples provide concrete examples to leverage Miles in your own RL workflow. Some examples are just demonstrative, but most of them are verifiable with a concrete performance score.
+These examples provide concrete examples to leverage miles in your own RL workflow. Some examples are just demonstrative, but most of them are verifiable with a concrete performance score.
 
 ## Directory Structure
 

diff --git a/examples/eval/README.md b/examples/eval/README.md
@@ -4,10 +4,10 @@ This directory contains configuration and utilities for offloading complex evalu
 
 ## Overview
 
-The setup allows Miles to delegate evaluation tasks to a dedicated "Skills" server. This creates a clear separation of concerns:
+The setup allows miles to delegate evaluation tasks to a dedicated "Skills" server. This creates a clear separation of concerns:
 
-1.  **Miles Container**: Runs the main training loop and hosts the model using SGLang.
-2.  **Skills Container**: Hosts the `nemo_skills` environment, runs the evaluation logic, and queries the model running in the Miles container.
+1.  **miles Container**: Runs the main training loop and hosts the model using SGLang.
+2.  **Skills Container**: Hosts the `nemo_skills` environment, runs the evaluation logic, and queries the model running in the miles container.
 
 ## Prerequisites
 
@@ -18,15 +18,15 @@ The setup allows Miles to delegate evaluation tasks to a dedicated "Skills" serv
 
 ### Prepare Host Network
 
-Create a Docker network to allow communication between the Miles and Skills containers.
+Create a Docker network to allow communication between the miles and Skills containers.
 
 ```bash
 docker network create skills-net
 ```
 
-### Launch the Miles Container
+### Launch the miles Container
 
-Start the main container where Miles and the model will run. Replace `<miles container name>` with your desired name (e.g., `miles_main`).
+Start the main container where miles and the model will run. Replace `<miles container name>` with your desired name (e.g., `miles_main`).
 
 ```bash
 docker run \
@@ -76,7 +76,7 @@ git clone -b miles https://github.com/guapisolo/Skills.git /opt/Skills
 
 # Install Skills package
 cd /opt/Skills
-pip install -e .
+pip install -e . --no-deps
 ```
 
 **b) Prepare Datasets**
@@ -92,7 +92,7 @@ python3 arena-hard/prepare.py
 
 **c) Start the Evaluation Server**
 
-Start the server that listens for evaluation requests from Miles.
+Start the server that listens for evaluation requests from miles.
 
 ```bash
 cd /opt/miles
@@ -105,20 +105,20 @@ python examples/eval/nemo_skills/skills_server.py \
   --max-concurrent-requests 512 \
   --openai-model-name miles-openai-model
 ```
-*Note: You can now connect to the server at `skills_server:9050` from within the `skills-net` Docker network. The server always proxies evaluation traffic to an OpenAI-compatible sglang router (Miles starts and manage the router), so adjust `--openai-model-name` and `--max-concurrent-requests` as needed for your deployment.
+*Note: You can now connect to the server at `skills_server:9050` from within the `skills-net` Docker network. The server always proxies evaluation traffic to an OpenAI-compatible sglang router (miles starts and manage the router), so adjust `--openai-model-name` and `--max-concurrent-requests` as needed for your deployment.
 
 ## Running Evaluation
 
 The example scripts are located in `examples/eval/scripts`. Here is an example workflow for training Qwen3-4B with delegated evaluation.
 
-### Prepare Miles Container
+### Prepare miles Container
 
-Enter the **Miles container** and install the package.
+Enter the **miles container** and install the package.
 
 ```bash
 cd /root/miles
 git pull
-pip install -e .
+pip install -e . --no-deps
 ```
 
 ### Download Model and Data

diff --git a/examples/low_precision/README.md b/examples/low_precision/README.md
@@ -89,7 +89,7 @@ This guide provides examples for INT4 STE (Straight-Through Estimator) training
 First, download the PTQ (Post-Training Quantization) calibration dataset from HuggingFace:
 [https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1)
 
-Next, use the `tools/convert_hf_to_hf_int4.py` script to convert BF16 weights to INT4 format. Ensure that the `--hf-checkpoint` parameter points to a directory where `config.json` contains the correct `quantization_config`. Miles will automatically utilize INT4 quantization during weight updates.
+Next, use the `tools/convert_hf_to_hf_int4.py` script to convert BF16 weights to INT4 format. Ensure that the `--hf-checkpoint` parameter points to a directory where `config.json` contains the correct `quantization_config`. miles will automatically utilize INT4 quantization during weight updates.
 
 ```bash
 python tools/convert_hf_to_hf_int4.py \

diff --git a/examples/on_policy_distillation/README.md b/examples/on_policy_distillation/README.md
@@ -1,6 +1,6 @@
 # On-Policy Distillation Example
 
-This example shows how to run **on-policy distillation** using Miles. A small student (Qwen3-8B) is aligned to imitate a larger teacher (Qwen3-32B) by training only on the student's own rollouts and matching the teacher's token-level log-probabilities.
+This example shows how to run **on-policy distillation** using miles. A small student (Qwen3-8B) is aligned to imitate a larger teacher (Qwen3-32B) by training only on the student's own rollouts and matching the teacher's token-level log-probabilities.
 
 In this example, the teacher model acts as a reward model (RM) by providing teacher log probabilities as the supervision signal.
 
@@ -50,7 +50,7 @@ Using Qwen3-8B-Base model sfted on part of the [OpenThoughts3-1.2M](https://hugg
 
 # FAQ
 1. **Why are teacher logits computed via a sglang server instead of inside the training backend?**
-The teacher runs on an independent SGLang server that Miles treats as a reward model. Hosting it inside Megatron/FSDP would require maintaining a second, fully configured training stack for the teacher.
+The teacher runs on an independent SGLang server that miles treats as a reward model. Hosting it inside Megatron/FSDP would require maintaining a second, fully configured training stack for the teacher.
 
 
 # References

diff --git a/examples/retool/README.md b/examples/retool/README.md
@@ -21,7 +21,7 @@ The retool example provides:
 1. Setup and download datasets:
 ```bash
 cd miles
-pip install -e .
+pip install -e . --no-deps
 # For SFT part, you can use later model to RL directly and skip SFT. 
 hf download --repo-type dataset JoeYing/ReTool-SFT  --local-dir /root/JoeYing/ReTool-SFT
 hf download Qwen/Qwen3-4B-Instruct-2507 --local-dir /root/Qwen/Qwen3-4B-Instruct-2507

diff --git a/examples/search-r1/README.md b/examples/search-r1/README.md
@@ -9,7 +9,7 @@ Use the `radixark/miles:latest` image and initialize the environment required fo
 ```bash
 cd /root/
 git clone https://github.com/radixark/miles.git
-pip install -e .
+pip install -e . --no-deps
 # for Search R1
 pip install chardet
 ```

diff --git a/examples/strands_sglang/README.md b/examples/strands_sglang/README.md
@@ -1,4 +1,4 @@
-# Miles x Strands-SGLang
+# miles x Strands-SGLang
 
 This example connects `miles` with [`strands-sglang`](https://github.com/horizon-rl/strands-sglang) (SGLang extension for the agentic scaffolding [`strands`](https://github.com/strands-agents/sdk-python)) for agentic RL training.
 
@@ -20,7 +20,7 @@ This example connects `miles` with [`strands-sglang`](https://github.com/horizon
 
 1. Pull the `radixark/miles:latest` image and enter it
 2. Go to miles folder: `cd /root/miles`
-3. Install Miles: `pip install -e .`
+3. Install miles: `pip install -e . --no-deps`
 4. Go to the example folder: `cd /root/miles/examples/strands_sglang`
 5. Install other dependencies: `pip install -r requirements.txt`
 

diff --git a/examples/tau-bench/README.md b/examples/tau-bench/README.md
@@ -9,13 +9,13 @@ Use the `zhuzilin/miles:latest` image and initialize the environment required fo
 cd /root/
 git clone https://github.com/radixark/miles.git
 cd miles
-pip install -e .
+pip install -e . --no-deps
 # for tau bench 
 cd /root/
 git clone https://github.com/JD-ETH/tau-bench.git
 cd tau-bench
 git checkout feature/litellm-retry
-pip install -e . 
+pip install -e . --no-deps 
 ```
 
 Use the following script to generate mock data for miles training.