Liquid VL for DPhi Space
This repo builds containerized Liquid visual models for DPhi Space.
Note
This repo includes options to build both Q4_0 and Q8_0 quantized versions of the Liquid VL models for Jetson Orin devices. However, only the Q4_0 versions have been confirmed to work. The Q8_0 versions were coupled with a larger base image l4t-ml, and were never tested.
bin/build-orin.shThis command builds two images:
liquidai/lfm2-vl-3b-gguf:orin-q4-latestliquidai/lfm2-vl-1p6b-gguf:orin-q4-latest
Note
The images need to be built natively on a Jetson Orin device.
# 3b
docker run --runtime nvidia --rm --network host \
liquidai/lfm2-vl-3b-gguf:orin-q4-latest
# 1.6b
docker run --runtime nvidia --rm --network host \
liquidai/lfm2-vl-1p6b-gguf:orin-q4-latestThese docker environment flags are supported by the entrypoint script to configure the llama-cpp server:
| Docker env flag | Corresponding llama-cpp server param |
Description | Default |
|---|---|---|---|
-e HOST |
--host |
Server host IP | 0.0.0.0 |
-e PORT |
--port |
Server port | 8080 |
-e N_GPU_LAYERS |
-ngl |
Max number of layers to store in VRAM | 999 |
-e N_PARALLEL |
-np |
Number of parallel requests | 1 |
-e CTX_SIZE |
-c |
Context size in tokens | 4096 |
-e BATCH_SIZE |
-b |
Logical maximum batch size | 512 |
-e UBATCH_SIZE |
-ub |
Physical maximum batch size | 128 |
Reference: link
Example:
docker run --runtime nvidia --rm --network host \
-e CTX_SIZE=2048 \
-e BATCH_SIZE=64 \
-e UBATCH_SIZE=32 \
liquidai/lfm2-vl-3b-gguf:orin-q4-latestpython vlm_infer.py \
--server http://127.0.0.1:8080 \
--image ./images/example.png \
--prompt "Describe what you see in the image"These images should be optimized for:
- Jetson Orin 16GB
- JetPack 6.2.1 (L4T 36.4.4, CUDA 12.6)
| Model Size | Quantization | Model VRAM usage (GB) | Peak memory utilization (%) | Token per sec |
|---|---|---|---|---|
| 1.6B | Q4_0 |
2.2 | 16% | 700 |
| 3B | Q4_0 |
3.0 | 20% | 400 |
| 3B | Q8_0 |
4.2 | 30% | 340 |
| 1.6B | Q8_0 |
2.8 | 25% | 610 |
All numbers are measured on GH200 with the following command:
nvidia-smi --query-gpu=timestamp,name,utilization.gpu,utilization.memory,memory.used,memory.total,temperature.gpu --format=csv -l 1GPU utilization can reach 100% during inference.
(Click to expand)
Development runs on GH200 on Lambda Labs, as we don't currently have Jetson Orin hardware for local testing.
To build all variants at once:
./build_all.shOr build individual variants:
# Orin builds
uv run build-orin-1p6b-q8
uv run build-orin-1p6b-q4
uv run build-orin-3b-q8
uv run build-orin-3b-q4
# GH200 builds (for development testing)
uv run build-gh200-1p6b-q8
uv run build-gh200-1p6b-q4
uv run build-gh200-3b-q8
uv run build-gh200-3b-q4Launch the server
# Run 3B model
bin/run-vl.sh liquidai/lfm2-vl-3b-gguf:gh200-q8-latest
bin/run-vl.sh liquidai/lfm2-vl-3b-gguf:gh200-q4-latest
# Run 1.6B model
bin/run-vl.sh liquidai/lfm2-vl-1p6b-gguf:gh200-q8-latest
bin/run-vl.sh liquidai/lfm2-vl-1p6b-gguf:gh200-q4-latest[!NOTE] Running the Orin images will result in the following error on GH200.
/usr/local/bin/llama-server: error while loading shared libraries: libnvrm_gpu.so: cannot open shared object file: No such file or directoryThis is expected because
libnvrm_gpu.sois only available on Jetson devices.
Run inference
bin/test-vl.shPushing the image to Docker Hub is unnecessary, since the image must be built natively on Jetson Orin devices.
Push Orin images:
docker push liquidai/lfm2-vl-1p6b-gguf:orin-q4-latest
docker push liquidai/lfm2-vl-1p6b-gguf:orin-q4-<commit-hash>
docker push liquidai/lfm2-vl-3b-gguf:orin-q4-latest
docker push liquidai/lfm2-vl-3b-gguf:orin-q4-<commit-hash>Push GH200 images:
docker push liquidai/lfm2-vl-1p6b-gguf:gh200-q4-latest
docker push liquidai/lfm2-vl-1p6b-gguf:gh200-q4-<commit-hash>
docker push liquidai/lfm2-vl-3b-gguf:gh200-q4-latest
docker push liquidai/lfm2-vl-3b-gguf:gh200-q4-<commit-hash>| Category | Description |
|---|---|
| Configuration | l4t-pytorch:r36.4.0 + Q4, built on GH200 |
| Error | double free or corruption (out) |
| Root Cause | Architecture mismatch (GH200 armv9 → Orin armv8) |
| Resolution | Build locally on Jetson Orin with matching architecture |
| Category | Description |
|---|---|
| Configuration | l4t-ml:r36.4.0 + Q8 |
| Error | failed to unpack loaded image: failed to extract layer sha256:... |
| Root Cause | Docker image exceeds EM system's btrfs overlay capacity |
| Key Finding | Works on devkit (overlay2), fails on EM (btrfs) |
MIT + LFM Open License v1.0. See LICENSE for details.