Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions leap/leap-bundle/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@ sidebar_position: 4

# Changelog

## `v0.9.0` - unreleased

**New features**

- GGUF is now the default inference engine for model bundling, generating `.gguf` files for llama.cpp inference.
- Add `--executorch` flag to use ExecuteTorch bundling instead of GGUF. ExecuteTorch inference is deprecated and may be removed in a future version.
- Add `--mmproj-quantization` option for GGUF bundling of vision-language and audio models.
- Support downloading multiple `.gguf` files for GGUF bundle requests.

## `v0.8.0` - 2025-12-16

**Improvements**
Expand Down
40 changes: 28 additions & 12 deletions leap/leap-bundle/cli-spec.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ sidebar_position: 2

The Model Bundling Service provides a command-line interface (CLI) with two main features:

1. **LEAP Bundle Requests**: Upload model directories, create bundle requests, monitor processing status, and download completed bundles for the LEAP (Liquid Edge AI Platform)
1. **LEAP Bundle Requests**: Upload model directories, create bundle requests, monitor processing status, and download completed bundles for the LEAP (Liquid Edge AI Platform). Supports two inference engines:
- **GGUF (default)**: Generates `.gguf` files for llama.cpp inference
- **ExecuteTorch** (deprecated): Generates `.bundle` files for ExecuteTorch inference. This option may be removed in a future version.
2. **Manifest Downloads**: Download pre-packaged GGUF models from JSON manifest URLs without authentication

## Requirements
Expand Down Expand Up @@ -209,7 +211,11 @@ leap-bundle create <input-path>
- `--sequential`: Upload files sequentially. This is the fallback option if parallel upload fails.
- If neither `--parallel` nor `--sequential` is specified, the CLI will attempt parallel upload first, and fall back to sequential if it fails.
- If both `--parallel` and `--sequential` are specified, `--parallel` takes precedence.
- `--quantization <type>`: Specify the quantization type for the model bundle. Valid options: `8da4w_output_8da8w` (default), `8da8w_output_8da8w`.
- `--executorch` (deprecated): Use ExecuteTorch bundling instead of GGUF. By default, the CLI uses GGUF bundling. This option may be removed in a future version.
- `--quantization <type>`: Specify the quantization type for the model bundle.
- For GGUF (default): `Q4_K_M` (default), `Q8_0`, `F16`, and [other llama.cpp quantization types](https://github.com/ggml-org/llama.cpp/blob/0a0bba05e8390ab7e4a54bb8c0ed0a25da64cf62/tools/quantize/quantize.cpp#L22-L58).
- For ExecuteTorch (deprecated): `8da4w_output_8da8w` (default), `8da8w_output_8da8w`.
- `--mmproj-quantization <type>`: (GGUF only) Specify the mmproj quantization type for vision-language or audio models. Valid options: `q4`, `q8` (default), `f16`.

**Behavior**

Expand Down Expand Up @@ -245,8 +251,17 @@ leap-bundle create ./my-model-directory --json
# Example JSON output when request already exists
{"error": "A bundle request with the same input hash already exists: req_xyz789abc123", "status": "exists"}

# Create bundle with specific quantization
leap-bundle create ./my-model-directory --quantization 8da8w_output_8da8w
# Create GGUF bundle with specific quantization
leap-bundle create ./my-model-directory --quantization Q8_0

# Create ExecuteTorch bundle
leap-bundle create ./my-model-directory --executorch

# Create ExecuteTorch bundle with specific quantization
leap-bundle create ./my-model-directory --executorch --quantization 8da8w_output_8da8w

# Create GGUF bundle for VL model with mmproj quantization
leap-bundle create ./my-vl-model-directory --mmproj-quantization f16
```

**Validation**
Expand Down Expand Up @@ -588,7 +603,7 @@ This command supports two modes of operation:

#### Mode 1: Bundle Request Download

Download the bundle file for a completed request.
Download the model files for a completed request.

```sh
leap-bundle download <request-id> [--output-path <path>]
Expand All @@ -600,26 +615,27 @@ leap-bundle download <request-id> [--output-path <path>]

**Options**

- `--output-path <path>`: Directory to save the downloaded file (default: current directory)
- `--output-path <path>`: Directory to save the downloaded files (default: current directory)

**Behavior**

- Requests a signed download URL from the LEAP platform
- Downloads the bundle file using the signed URL
- Saves the file with a default name or to the specified output path
- Requests signed download URLs from the LEAP platform
- Downloads the model files using the signed URLs
- Saves files with default names or to the specified output path
- GGUF requests may produce multiple `.gguf` files; ExecuteTorch requests produce a single `.bundle` file
- **Requires authentication** via `leap-bundle login`

**Examples**

```sh
# Download bundle request to current directory
# Download GGUF bundle request to current directory
leap-bundle download 18734

# Example output
ℹ Requesting download for bundle request 18734...
✓ Download URL obtained for request 18734
Downloading bundle output... ✓
✓ Download completed successfully! File saved to: input-8da4w_output_8da8w-seq_8196.bundle
✓ Download completed successfully! File saved to: model-Q4_K_M.gguf

# Download to specific directory
leap-bundle download 18734 --output-path ./downloads/
Expand All @@ -628,7 +644,7 @@ leap-bundle download 18734 --output-path ./downloads/
ℹ Requesting download for bundle request 18734...
✓ Download URL obtained for request 18734
Downloading bundle output... ✓
✓ Download completed successfully! File saved to: downloads/input-8da4w_output_8da8w-seq_8196.bundle
✓ Download completed successfully! File saved to: downloads/model-Q4_K_M.gguf
```

**Error Cases**
Expand Down
28 changes: 12 additions & 16 deletions leap/leap-bundle/quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,10 @@ sidebar_position: 1

The Bundling Service helps users create and manage model bundles for Liquid Edge AI Platform (LEAP). Currently users interact with it through `leap-bundle`, a command-line interface (CLI).

Here is a typical user workflow:
The CLI supports two inference engines for model bundling:

- Download an open source base model.
- Customize the base model with your own dataset e.g. by finetuning.
- Create a model bundle using the `leap-bundle` CLI for LEAP SDK.
- **GGUF (default)**: Generates `.gguf` files for llama.cpp inference
- **ExecuteTorch** (deprecated): Generates `.bundle` files for ExecuteTorch inference (use `--executorch` flag). This option may be removed in a future version.

The CLI also supports downloading GGUF models directly from JSON manifest files.

Expand Down Expand Up @@ -52,13 +51,7 @@ Manifest downloads don't require authentication with `leap-bundle login`. They w
the model architecture comes from a base model that is part of the LEAP model library.
:::

If you have a custom-trained or fine-tuned model, you can create a model bundle for use with LEAP SDK.

Here is a typical user workflow:

- Download an open source base model.
- Customize the base model with your own dataset e.g. by finetuning.
- Create a model bundle using the `leap-bundle` CLI for LEAP SDK.
If you have a custom-trained or fine-tuned model, you can create a model bundle for use with LEAP SDK. By default, the CLI generates GGUF files for llama.cpp inference. Use the `--executorch` flag to generate ExecuteTorch bundles instead.

### Authenticate

Expand Down Expand Up @@ -151,10 +144,10 @@ Example output:
ℹ Requesting download for bundle request 1...
✓ Download URL obtained for request 1
Downloading bundle output... ✓
✓ Download completed successfully! File saved to: input-8da4w_output_8da8w-seq_8196.bundle
✓ Download completed successfully! File saved to: model-Q4_K_M.gguf
```

The model bundle file will be saved in the current directory with a `.bundle` extension.
The model files will be saved in the current directory. GGUF bundling produces `.gguf` files, while ExecuteTorch bundling produces `.bundle` files.

### Complete Example

Expand All @@ -166,17 +159,20 @@ pip install leap-bundle
leap-bundle login <api-key>
leap-bundle whoami

# 2. Create a bundle request
# 2. Create a bundle request (GGUF by default)
leap-bundle create <model-directory>

# Or create an ExecuteTorch bundle
leap-bundle create <model-directory> --executorch

# 3. Monitor the request (repeat until completed)
leap-bundle list

# 4. Download when ready
leap-bundle download <request-id>

# 5. Your bundle file is now ready to use!
ls -la <downloaded-bundle-file>
# 5. Your model files are now ready to use!
ls -la <downloaded-model-files>
```

### Managing Requests
Expand Down