Skip to content

kacperx0m/mini-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Installation:

Download ollama from: https://ollama.com/

Run downloaded app or type:
ollama serve

Download your model, but keep in mind that app was tested using llama3.2:1b.
ollama pull llama3.2:1b

Install dependencies:
pip install requirements.txt

Put your pdf documents inside documents folder

Activate virtual env using command:
./mini-rag/Scripts/activate

run python main.py

Go to 127.0.0.1:8000 and wait if needed for the system to load. Refresh from time to time or look at console to see if loading is completed.

Now you should be able to ask your questions freely.

Questions:

What restrictions does this app have?

  • It uses CPU for calculations which in turn takes long time to process,
  • Embeddings and retrieval are sensitive to used words, queries and sentences,
  • Answers are non deterministic,
  • Number of embeddings is restricted by RAM size.

What would you correct in the app if you had more time?

  • Improve embeddings calculations,
  • Move heavy tasks to GPU,
  • Add vector database,
  • Make app more responsive.

How would you prepare this system for production in terms of scaling and monitoring?

  • Use better hardware,
  • Find and use better was for chunking and retrieval,
  • Find and use better embedding models,
  • Find and use better LLM models,
  • Find and switch to better prompts,
  • Allow LLMs for external knowledge use or scrap data if needed,
  • Use vector database,
  • Add chat history,
  • Return more detailed sources,
  • Add more tests.

Testing:

To test, run pytest -v test_rag.py

About

This is a mini semi local RAG

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages