Liquid VL for DPhi Space

This repo builds containerized Liquid visual models for DPhi Space.

Note

This repo includes options to build both Q4_0 and Q8_0 quantized versions of the Liquid VL models for Jetson Orin devices. However, only the Q4_0 versions have been confirmed to work. The Q8_0 versions were coupled with a larger base image l4t-ml, and were never tested.

For DPhi Space

Build images

bin/build-orin.sh

This command builds two images:

liquidai/lfm2-vl-3b-gguf:orin-q4-latest
liquidai/lfm2-vl-1p6b-gguf:orin-q4-latest

Note

The images need to be built natively on a Jetson Orin device.

Launch the server

# 3b
docker run --runtime nvidia --rm --network host \
  liquidai/lfm2-vl-3b-gguf:orin-q4-latest

# 1.6b
docker run --runtime nvidia --rm --network host \
  liquidai/lfm2-vl-1p6b-gguf:orin-q4-latest

These docker environment flags are supported by the entrypoint script to configure the llama-cpp server:

Docker env flag	Corresponding `llama-cpp` server param	Description	Default
`-e HOST`	`--host`	Server host IP	`0.0.0.0`
`-e PORT`	`--port`	Server port	8080
`-e N_GPU_LAYERS`	`-ngl`	Max number of layers to store in VRAM	999
`-e N_PARALLEL`	`-np`	Number of parallel requests	1
`-e CTX_SIZE`	`-c`	Context size in tokens	4096
`-e BATCH_SIZE`	`-b`	Logical maximum batch size	512
`-e UBATCH_SIZE`	`-ub`	Physical maximum batch size	128

Reference: link

Example:

docker run --runtime nvidia --rm --network host \
  -e CTX_SIZE=2048 \
  -e BATCH_SIZE=64 \
  -e UBATCH_SIZE=32 \
  liquidai/lfm2-vl-3b-gguf:orin-q4-latest

Run inference

python vlm_infer.py \
  --server http://127.0.0.1:8080 \
  --image ./images/example.png \
  --prompt "Describe what you see in the image"

Environment Details

These images should be optimized for:

Jetson Orin 16GB
JetPack 6.2.1 (L4T 36.4.4, CUDA 12.6)

Performance and Resource Consumption

Model Size	Quantization	Model VRAM usage (GB)	Peak memory utilization (%)	Token per sec
1.6B	`Q4_0`	2.2	16%	700
3B	`Q4_0`	3.0	20%	400
3B	`Q8_0`	4.2	30%	340
1.6B	`Q8_0`	2.8	25%	610

All numbers are measured on GH200 with the following command:

nvidia-smi --query-gpu=timestamp,name,utilization.gpu,utilization.memory,memory.used,memory.total,temperature.gpu --format=csv -l 1

GPU utilization can reach 100% during inference.

For Development

(Click to expand)

Development runs on GH200 on Lambda Labs, as we don't currently have Jetson Orin hardware for local testing.

Build All Images

To build all variants at once:

./build_all.sh

Or build individual variants:

# Orin builds
uv run build-orin-1p6b-q8
uv run build-orin-1p6b-q4
uv run build-orin-3b-q8
uv run build-orin-3b-q4

# GH200 builds (for development testing)
uv run build-gh200-1p6b-q8
uv run build-gh200-1p6b-q4
uv run build-gh200-3b-q8
uv run build-gh200-3b-q4

Test on GH200

Launch the server

# Run 3B model
bin/run-vl.sh liquidai/lfm2-vl-3b-gguf:gh200-q8-latest
bin/run-vl.sh liquidai/lfm2-vl-3b-gguf:gh200-q4-latest

# Run 1.6B model
bin/run-vl.sh liquidai/lfm2-vl-1p6b-gguf:gh200-q8-latest
bin/run-vl.sh liquidai/lfm2-vl-1p6b-gguf:gh200-q4-latest

[!NOTE] Running the Orin images will result in the following error on GH200.
/usr/local/bin/llama-server: error while loading shared libraries: libnvrm_gpu.so: cannot open shared object file: No such file or directory
This is expected because libnvrm_gpu.so is only available on Jetson devices.

Run inference

bin/test-vl.sh

Publish to Docker Hub

Pushing the image to Docker Hub is unnecessary, since the image must be built natively on Jetson Orin devices.

Push Orin images:

docker push liquidai/lfm2-vl-1p6b-gguf:orin-q4-latest
docker push liquidai/lfm2-vl-1p6b-gguf:orin-q4-<commit-hash>
docker push liquidai/lfm2-vl-3b-gguf:orin-q4-latest
docker push liquidai/lfm2-vl-3b-gguf:orin-q4-<commit-hash>

Push GH200 images:

docker push liquidai/lfm2-vl-1p6b-gguf:gh200-q4-latest
docker push liquidai/lfm2-vl-1p6b-gguf:gh200-q4-<commit-hash>
docker push liquidai/lfm2-vl-3b-gguf:gh200-q4-latest
docker push liquidai/lfm2-vl-3b-gguf:gh200-q4-<commit-hash>

Known issues and resolutions

Cross-compilation issue

Category	Description
Configuration	`l4t-pytorch:r36.4.0` + Q4, built on GH200
Error	`double free or corruption (out)`
Root Cause	Architecture mismatch (GH200 armv9 → Orin armv8)
Resolution	Build locally on Jetson Orin with matching architecture

Image size overflow with `l4t-ml`

Category	Description
Configuration	`l4t-ml:r36.4.0` + Q8
Error	`failed to unpack loaded image: failed to extract layer sha256:...`
Root Cause	Docker image exceeds EM system's btrfs overlay capacity
Key Finding	Works on devkit (overlay2), fails on EM (btrfs)

License

MIT + LFM Open License v1.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
bin		bin
dphi_build		dphi_build
images		images
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
entrypoint.sh		entrypoint.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock
vlm_infer.py		vlm_infer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Liquid VL for DPhi Space

For DPhi Space

Build images

Launch the server

Run inference

Environment Details

Performance and Resource Consumption

For Development

Build All Images

Test on GH200

Publish to Docker Hub

Known issues and resolutions

Cross-compilation issue

Image size overflow with `l4t-ml`

License

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

Liquid4All/dphi-space-vl

Folders and files

Latest commit

History

Repository files navigation

Liquid VL for DPhi Space

For DPhi Space

Build images

Launch the server

Run inference

Environment Details

Performance and Resource Consumption

For Development

Build All Images

Test on GH200

Publish to Docker Hub

Known issues and resolutions

Cross-compilation issue

Image size overflow with l4t-ml

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages

Image size overflow with `l4t-ml`