Use CPU-only inference and enable hot reload#5

Draft

CesarPetrescu wants to merge 1 commit intomainfrom

codex/remove-gpu-support-and-fix-dependencies

Owner

CesarPetrescu commented Aug 18, 2025

Summary

stub out flash_attn in its own module so CPU installs skip GPU extras
drop accelerate to avoid pulling in GPU tooling
force CPU model loading and drop CUDA handling
install CPU-only torch wheels and enable uvicorn --reload
mount source code for live reload in docker-compose

Testing

python -m py_compile invoice-extract/app/flash_attn_stub.py invoice-extract/app/main.py
docker compose build --no-cache (fails: command not found)

https://chatgpt.com/codex/tasks/task_e_68a27c69d57883308e44b2149aef889c


          Stub flash_attn in separate module

bd37321

CesarPetrescu added the codex label

— with

ChatGPT Codex Connector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels