Skip to content

Use CPU-only inference and enable hot reload#7

Open
CesarPetrescu wants to merge 1 commit intomainfrom
codex/remove-gpu-support-and-fix-dependencies-rjgp22
Open

Use CPU-only inference and enable hot reload#7
CesarPetrescu wants to merge 1 commit intomainfrom
codex/remove-gpu-support-and-fix-dependencies-rjgp22

Conversation

@CesarPetrescu
Copy link
Owner

Summary

  • stub out flash_attn in its own module so CPU installs skip GPU extras
  • drop accelerate to avoid pulling in GPU tooling
  • force CPU model loading and drop CUDA handling
  • install CPU-only torch wheels and enable uvicorn --reload
  • mount source code for live reload in docker-compose
  • symlink HuggingFace cache directories so rednote-hilab/dots.ocr loads correctly

Testing

  • python -m py_compile invoice-extract/app/main.py invoice-extract/app/flash_attn_stub.py
  • docker compose build --no-cache (fails: command not found)

https://chatgpt.com/codex/tasks/task_e_68a27c69d57883308e44b2149aef889c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant