Use CPU-only inference and enable hot reload#7

Open

CesarPetrescu wants to merge 1 commit intomainfrom

codex/remove-gpu-support-and-fix-dependencies-rjgp22

Owner

CesarPetrescu commented Aug 18, 2025

Summary

stub out flash_attn in its own module so CPU installs skip GPU extras
drop accelerate to avoid pulling in GPU tooling
force CPU model loading and drop CUDA handling
install CPU-only torch wheels and enable uvicorn --reload
mount source code for live reload in docker-compose
symlink HuggingFace cache directories so rednote-hilab/dots.ocr loads correctly

Testing

python -m py_compile invoice-extract/app/main.py invoice-extract/app/flash_attn_stub.py
docker compose build --no-cache (fails: command not found)

https://chatgpt.com/codex/tasks/task_e_68a27c69d57883308e44b2149aef889c


          Fix dynamic module import for HuggingFace model

ab21bf2

CesarPetrescu added the codex label

— with

ChatGPT Codex Connector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels