Liquid4All · tuliren · Jan 2, 2026 · Jan 2, 2026 · Jan 2, 2026 · Jan 2, 2026
diff --git a/leap/leap-bundle/changelog.md b/leap/leap-bundle/changelog.md
@@ -4,6 +4,15 @@ sidebar_position: 4
 
 # Changelog
 
+## `v0.9.0` - unreleased
+
+**New features**
+
+- GGUF is now the default inference engine for model bundling, generating `.gguf` files for llama.cpp inference.
+- Add `--executorch` flag to use ExecuteTorch bundling instead of GGUF. ExecuteTorch inference is deprecated and may be removed in a future version.
+- Add `--mmproj-quantization` option for GGUF bundling of vision-language and audio models.
+- Support downloading multiple `.gguf` files for GGUF bundle requests.
+
 ## `v0.8.0` - 2025-12-16
 
 **Improvements**

diff --git a/leap/leap-bundle/cli-spec.mdx b/leap/leap-bundle/cli-spec.mdx
@@ -8,7 +8,9 @@ sidebar_position: 2
 
 The Model Bundling Service provides a command-line interface (CLI) with two main features:
 
-1. **LEAP Bundle Requests**: Upload model directories, create bundle requests, monitor processing status, and download completed bundles for the LEAP (Liquid Edge AI Platform)
+1. **LEAP Bundle Requests**: Upload model directories, create bundle requests, monitor processing status, and download completed bundles for the LEAP (Liquid Edge AI Platform). Supports two inference engines:
+   - **GGUF (default)**: Generates `.gguf` files for llama.cpp inference
+   - **ExecuteTorch** (deprecated): Generates `.bundle` files for ExecuteTorch inference. This option may be removed in a future version.
 2. **Manifest Downloads**: Download pre-packaged GGUF models from JSON manifest URLs without authentication
 
 ## Requirements
@@ -209,7 +211,11 @@ leap-bundle create <input-path>
 - `--sequential`: Upload files sequentially. This is the fallback option if parallel upload fails.
   - If neither `--parallel` nor `--sequential` is specified, the CLI will attempt parallel upload first, and fall back to sequential if it fails.
   - If both `--parallel` and `--sequential` are specified, `--parallel` takes precedence.
-- `--quantization <type>`: Specify the quantization type for the model bundle. Valid options: `8da4w_output_8da8w` (default), `8da8w_output_8da8w`.
+- `--executorch` (deprecated): Use ExecuteTorch bundling instead of GGUF. By default, the CLI uses GGUF bundling. This option may be removed in a future version.
+- `--quantization <type>`: Specify the quantization type for the model bundle.
+  - For GGUF (default): `Q4_K_M` (default), `Q8_0`, `F16`, and [other llama.cpp quantization types](https://github.com/ggml-org/llama.cpp/blob/0a0bba05e8390ab7e4a54bb8c0ed0a25da64cf62/tools/quantize/quantize.cpp#L22-L58).
+  - For ExecuteTorch (deprecated): `8da4w_output_8da8w` (default), `8da8w_output_8da8w`.
+- `--mmproj-quantization <type>`: (GGUF only) Specify the mmproj quantization type for vision-language or audio models. Valid options: `q4`, `q8` (default), `f16`.
 
 **Behavior**
 
@@ -245,8 +251,17 @@ leap-bundle create ./my-model-directory --json
 # Example JSON output when request already exists
 {"error": "A bundle request with the same input hash already exists: req_xyz789abc123", "status": "exists"}
 
-# Create bundle with specific quantization
-leap-bundle create ./my-model-directory --quantization 8da8w_output_8da8w
+# Create GGUF bundle with specific quantization
+leap-bundle create ./my-model-directory --quantization Q8_0
+
+# Create ExecuteTorch bundle
+leap-bundle create ./my-model-directory --executorch
+
+# Create ExecuteTorch bundle with specific quantization
+leap-bundle create ./my-model-directory --executorch --quantization 8da8w_output_8da8w
+
+# Create GGUF bundle for VL model with mmproj quantization
+leap-bundle create ./my-vl-model-directory --mmproj-quantization f16
 ```
 
 **Validation**
@@ -588,7 +603,7 @@ This command supports two modes of operation:
 
 #### Mode 1: Bundle Request Download
 
-Download the bundle file for a completed request.
+Download the model files for a completed request.
 
 ```sh
 leap-bundle download <request-id> [--output-path <path>]
@@ -600,26 +615,27 @@ leap-bundle download <request-id> [--output-path <path>]
 
 **Options**
 
-- `--output-path <path>`: Directory to save the downloaded file (default: current directory)
+- `--output-path <path>`: Directory to save the downloaded files (default: current directory)
 
 **Behavior**
 
-- Requests a signed download URL from the LEAP platform
-- Downloads the bundle file using the signed URL
-- Saves the file with a default name or to the specified output path
+- Requests signed download URLs from the LEAP platform
+- Downloads the model files using the signed URLs
+- Saves files with default names or to the specified output path
+- GGUF requests may produce multiple `.gguf` files; ExecuteTorch requests produce a single `.bundle` file
 - **Requires authentication** via `leap-bundle login`
 
 **Examples**
 
 ```sh
-# Download bundle request to current directory
+# Download GGUF bundle request to current directory
 leap-bundle download 18734
 
 # Example output
 ℹ Requesting download for bundle request 18734...
 ✓ Download URL obtained for request 18734
 Downloading bundle output... ✓
-✓ Download completed successfully! File saved to: input-8da4w_output_8da8w-seq_8196.bundle
+✓ Download completed successfully! File saved to: model-Q4_K_M.gguf
 
 # Download to specific directory
 leap-bundle download 18734 --output-path ./downloads/
@@ -628,7 +644,7 @@ leap-bundle download 18734 --output-path ./downloads/
 ℹ Requesting download for bundle request 18734...
 ✓ Download URL obtained for request 18734
 Downloading bundle output... ✓
-✓ Download completed successfully! File saved to: downloads/input-8da4w_output_8da8w-seq_8196.bundle
+✓ Download completed successfully! File saved to: downloads/model-Q4_K_M.gguf
 ```
 
 **Error Cases**

diff --git a/leap/leap-bundle/quick-start.mdx b/leap/leap-bundle/quick-start.mdx
@@ -6,11 +6,10 @@ sidebar_position: 1
 
 The Bundling Service helps users create and manage model bundles for Liquid Edge AI Platform (LEAP). Currently users interact with it through `leap-bundle`, a command-line interface (CLI).
 
-Here is a typical user workflow:
+The CLI supports two inference engines for model bundling:
 
-- Download an open source base model.
-- Customize the base model with your own dataset e.g. by finetuning.
-- Create a model bundle using the `leap-bundle` CLI for LEAP SDK.
+- **GGUF (default)**: Generates `.gguf` files for llama.cpp inference
+- **ExecuteTorch** (deprecated): Generates `.bundle` files for ExecuteTorch inference (use `--executorch` flag). This option may be removed in a future version.
 
 The CLI also supports downloading GGUF models directly from JSON manifest files.
 
@@ -52,13 +51,7 @@ Manifest downloads don't require authentication with `leap-bundle login`. They w
 the model architecture comes from a base model that is part of the LEAP model library.
 :::
 
-If you have a custom-trained or fine-tuned model, you can create a model bundle for use with LEAP SDK.
-
-Here is a typical user workflow:
-
-- Download an open source base model.
-- Customize the base model with your own dataset e.g. by finetuning.
-- Create a model bundle using the `leap-bundle` CLI for LEAP SDK.
+If you have a custom-trained or fine-tuned model, you can create a model bundle for use with LEAP SDK. By default, the CLI generates GGUF files for llama.cpp inference. Use the `--executorch` flag to generate ExecuteTorch bundles instead.
 
 ### Authenticate
 
@@ -151,10 +144,10 @@ Example output:
 ℹ Requesting download for bundle request 1...
 ✓ Download URL obtained for request 1
 Downloading bundle output... ✓
-✓ Download completed successfully! File saved to: input-8da4w_output_8da8w-seq_8196.bundle
+✓ Download completed successfully! File saved to: model-Q4_K_M.gguf
 ```
 
-The model bundle file will be saved in the current directory with a `.bundle` extension.
+The model files will be saved in the current directory. GGUF bundling produces `.gguf` files, while ExecuteTorch bundling produces `.bundle` files.
 
 ### Complete Example
 
@@ -166,17 +159,20 @@ pip install leap-bundle
 leap-bundle login <api-key>
 leap-bundle whoami
 
-# 2. Create a bundle request
+# 2. Create a bundle request (GGUF by default)
 leap-bundle create <model-directory>
 
+# Or create an ExecuteTorch bundle
+leap-bundle create <model-directory> --executorch
+
 # 3. Monitor the request (repeat until completed)
 leap-bundle list
 
 # 4. Download when ready
 leap-bundle download <request-id>
 
-# 5. Your bundle file is now ready to use!
-ls -la <downloaded-bundle-file>
+# 5. Your model files are now ready to use!
+ls -la <downloaded-model-files>
 ```
 
 ### Managing Requests