Skip to content

Use CPU-only inference and enable hot reload#5

Draft
CesarPetrescu wants to merge 1 commit intomainfrom
codex/remove-gpu-support-and-fix-dependencies
Draft

Use CPU-only inference and enable hot reload#5
CesarPetrescu wants to merge 1 commit intomainfrom
codex/remove-gpu-support-and-fix-dependencies

Conversation

@CesarPetrescu
Copy link
Owner

Summary

  • stub out flash_attn in its own module so CPU installs skip GPU extras
  • drop accelerate to avoid pulling in GPU tooling
  • force CPU model loading and drop CUDA handling
  • install CPU-only torch wheels and enable uvicorn --reload
  • mount source code for live reload in docker-compose

Testing

  • python -m py_compile invoice-extract/app/flash_attn_stub.py invoice-extract/app/main.py
  • docker compose build --no-cache (fails: command not found)

https://chatgpt.com/codex/tasks/task_e_68a27c69d57883308e44b2149aef889c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant