-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Running a minimal EKS cluster, nodes
- Frontend
- Backend
- Milvus Vector DB
- Instruct LLM NIM
- Embedding model NIM
Is unfathomably expensive even if we stoop down to the level of using spot instances for certain nodes (#22) and Karpenter to try to scale down as much as possible and scale up fast ASAP on-demand.
So let's create a local version:
- LLMs will be served locally by Ollama or other inference solutions such as vLLM, SGLang, TRT LLM
- The GPUs will be NVIDIA
- Embedding model will be also served by Ollama
- The vecotr DB could be still Milvus, because that has a "lite" version which is in-process (https://superlinked.com/vector-db-comparison), similarly to an in-process ChromaDB: you only need to install the python package and that takes care of everything, no need for docker container. Milvus lite is capable of hybrid search unlike ChromaDB, however it cannot do BM25. If that doesn't pan out we could try pgvector, or Weaviate.
- Use Rancher Desktop or similar solution for container management.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request