Create a local version of the solution

Running a minimal EKS cluster, nodes

2. Frontend
3. Backend
4. Milvus Vector DB
5. Instruct LLM NIM
6. Embedding model NIM

Is unfathomably expensive even if we stoop down to the level of using spot instances for certain nodes (#22) and Karpenter to try to scale down as much as possible and scale up fast ASAP on-demand.

<img width="940" height="461" alt="Image" src="https://github.com/user-attachments/assets/5f285c4e-cb67-43a1-997a-fc3636f7a826" />

<img width="952" height="459" alt="Image" src="https://github.com/user-attachments/assets/5e90315a-6353-4d51-bbe3-6c3d40802ebe" />

So let's create a local version:
1. LLMs will be served locally by Ollama or other inference solutions such as vLLM, SGLang, TRT LLM
2. The GPUs will be NVIDIA
3. Embedding model will be also served by Ollama
4. The vecotr DB could be still Milvus, because that has a "lite" version which is in-process (https://superlinked.com/vector-db-comparison), similarly to an in-process ChromaDB: you only need to install the python package and that takes care of everything, no need for docker container. Milvus lite is capable of hybrid search unlike ChromaDB, however it cannot do BM25. If that doesn't pan out we could try pgvector, or Weaviate.
5. Use Rancher Desktop or similar solution for container management.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a local version of the solution #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create a local version of the solution #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions