Releases: Liquid4All/on-prem-stack
`liquidai-cli@0.0.1b0`
0.0.1b0
LFM-1B-6GB@v0.0.2
Summary
This is the stack for LFM-1B that can run with 6GB GPU memory.
How to run for the first time
- Download
Source code (zip)below. - Unzip the file into an empty folder.
- Run
launch.sh.
How to upgrade
In .env, make these updates:
| Variable | Value |
|---|---|
STACK_VERSION |
2685ff757d-0312 |
MODEL_IMAGE |
liquidai/lfm-1b-e:0.0.1 |
In docker-compose.yaml, make these changes:
| Argument | Value |
|---|---|
--max-model-len |
"2048" |
--max-seq-len-to-capture |
"2048" |
--max-num-seqs |
"100" |
Then run launch.sh.
If the model container is throwing out-of-memory error, further decrease these arguments, keep max-seq-len-to-capture the same as max-model-len, and run launch.sh to retry.
How to test
- After running
launch.sh, wait up to 2 min for model initialization, and runtest-api.sh.- This script will trigger a smoke test to verify that the inference server is running correctly.
- Visit
0.0.0.0:3000and chat with the model in a web UI.
Full Changelog: https://github.com/Liquid4All/on-prem-stack/compare/1b-6gb@0.0.1...1b-6gb@0.0.2
v0.2.0
Summary
Important
This version has breaking change. Now web, python-api, and vllm services each can have a different Docker image version. It will automatically upgrade web and python-api to the latest version. vllm will remain upgradable only manually.
What's Changed
- Support VLM by @tuliren in #18
- Add local image as default vlm input by @tuliren in #19
- Mount local files for checkpoint by @tuliren in #20
- Use separate image versions by @tuliren in #21
Full Changelog: 0.1.0...0.2.0
v0.1.0
v0.0.4
What's Changed
- Add script to run vLLM for any HF model by @tuliren in #5
- Run local checkpoint by @tuliren in #6
- Pass in HF token for gated repository by @tuliren in #7
- Launch model checkpoint with one parameter by @tuliren in #8
- Add more vLLM launch parameters by @tuliren in #9
- Support 7B models by @tuliren in #11
- Use fixed database password and api key by @tuliren in #12
- Update service dependencies by @tuliren in #13
Full Changelog: 0.0.3...0.0.4
LFM-3B-JP v0.0.3
Summary
This is the stack for LFM-3B-JP. Two models are available: lfm-3b-jp and lfm-3b-ichikara.
How to run for the first time
- Download
Source code (zip)below. - Unzip the file into an empty folder.
- Run
launch.sh.
Models
Currently, each on-prem stack can only run one model at a time. The launch script runs lfm-3b-jp by default. To switch models, run .switch-model.sh and select the desired model to run. The script will then stop the current model and start the newly chosen model.
Update
To update the stack, change STACK_VERSION and MODEL_IMAGE in the .env file and run the launch script again.
How to test
- After running
launch.sh, wait up to 2 min for model initialization, and runtest-api.sh.- This script will trigger a smoke test to verify that the inference server is running correctly.
- Visit
0.0.0.0:3000and chat with the model in a web UI.
LFM-3B-JP v0.0.2
Summary
This is the stack for LFM-3B-JP.
How to run for the first time
- Download
Source code (zip)below. - Unzip the file into an empty folder.
- Run
launch.sh.
Models
Currently, each on-prem stack can only run one model at a time. The launch script runs lfm-3b-jp by default. To switch models, change MODEL_IMAGE in the .env file according to table below, and run ./launch.sh again.
| Model Image |
|---|
liquidai/lfm-3b-jp:0.0.1-e |
liquidai/lfm-3b-ichikara:0.0.1-e |
Update
To update the stack, change STACK_VERSION and MODEL_IMAGE in the .env file and run the launch script again.
How to test
- After running
launch.sh, wait up to 2 min for model initialization, and runtest-api.sh.- This script will trigger a smoke test to verify that the inference server is running correctly.
- Visit
0.0.0.0:3000and chat with the model in a web UI.
LFM-3B-JP v0.0.1
Summary
This is the stack for LFM-3B-JP.
How to run for the first time
- Download
Source code (zip)below. - Unzip the file into an empty folder.
- Run
launch.sh.
Models
Currently, each on-prem stack can only run one model at a time. The launch script runs lfm-3b-jp by default. To switch models, change MODEL_NAME and MODEL_IMAGE in the .env file according to table below, and run ./launch.sh again.
| Model Name | Model Image |
|---|---|
lfm-3b-jp |
liquidai/lfm-3b-jp:0.0.1-e |
lfm-3b-ichikara |
liquidai/lfm-3b-ichikara:0.0.1-e |
Update
To update the stack, change STACK_VERSION and MODEL_IMAGE in the .env file and run the launch script again.
How to test
- After running
launch.sh, wait up to 2 min for model initialization, and runtest-api.sh.- This script will trigger a smoke test to verify that the inference server is running correctly.
- Visit
0.0.0.0:3000and chat with the model in a web UI.
LFM-1B-6GB @ v0.0.1
Summary
This is the stack for LFM-1B that can run with 6GB GPU memory.
How to run for the first time
- Download
Source code (zip)below. - Unzip the file into an empty folder.
- Run
launch.sh.
How to upgrade
In .env, make these updates:
| Variable | Value |
|---|---|
STACK_VERSION |
2685ff757d |
MODEL_IMAGE |
liquidai/lfm-1be:0.0.1 |
In docker-compose.yaml, make these changes:
| Argument | Value |
|---|---|
--max-model-len |
"2048" |
--max-seq-len-to-capture |
"2048" |
--max-num-seqs |
"100" |
Then run launch.sh.
If the model container is throwing out-of-memory error, further decrease these arguments, keep max-seq-len-to-capture the same as max-model-len, and run launch.sh to retry.
How to test
- After running
launch.sh, wait up to 2 min for model initialization, and runtest-api.sh.- This script will trigger a smoke test to verify that the inference server is running correctly.
- Visit
0.0.0.0:3000and chat with the model in a web UI.
v0.0.3
How to run for the first time
- Download
Source code (zip)below. - Unzip the file into an empty folder.
- Run
launch.sh.
How to upgrade
- Download
Source code (zip)below. - Unzip the file into the current deployment folder, overwriting all existing files.
- Make sure to keep the existing
.envfile. - In the
.envfile, update theSTACK_VERSIONto2b3f969864, andMODEL_IMAGEtoliquidai/lfm-3be:0.0.6.- Please note that all previous versions have been removed.
- Run
launch.sh.
How to test
- After running
launch.sh, wait up to 2 min for model initialization, and runtest-api.sh.- This script will trigger a smoke test to verify that the inference server is running correctly.
- Visit
0.0.0.0:3000and chat with the model in a web UI.
What's Changed
- More robust Python backend.
- Updated 3B model.
Full Changelog: 0.0.2...0.0.3