Fast Embed & Rerank Service — RAG-ready

Blazing fast embedding and cross-encoder reranker service optimized for Retrieval-Augmented Generation (RAG), KG‑RAG (Knowledge-Graph RAG), and Graph‑RAG workflows. Self-host on a small GPU

Minimal integration flow

POST /api/v1/embedding → get embeddings for documents/nodes.
Index embeddings into a vector store (FAISS / postgresql(pgvector) / etc.).
On user query: retrieve candidates from vector store.
POST /api/v1/ce/reranker with query + documents → get final top‑N.
Send top‑N to your LLM as context for generation.

API (examples)

Embeddings

POST /api/v1/embedding
Request:

{ "texts": ["doc1 text", "doc2 text"] }

Response:

{ "embeddings": [[...],[...]], "dimensions": 1024 }

Rerank

POST /api/v1/ce/reranker
Request:

{
  "query": "Find causes of X",
  "documents": ["candidate A", "candidate B", "..."],
  "returnDocuments": false,
  "topN": 10
}

Response:

{
  "results":[
    {
    "docIndex"::17,
    "doctext":"",
    "score":0.99951171875
    }, ...

  ],
  "query":"Find causes of X"

}

Environment (example `.env`)

PORT = 8000
MAX_TOKEN_LIMIT_PER_TEXT = 500
EMBEDDING_MODEL_NAME = thenlper/gte-large
MAX_EMBEDDING_TEXTS_PER_REQUEST = 100
MAX_EMBEDDING_BATCH_REQUEST_DELAY = 5
MAX_EMBEDDING_BATCH_SIZE = 50
CROSS_ENCODER_MODEL_NAME = cross-encoder/ms-marco-MiniLM-L6-v2
MAX_CE_RE_RANKER_PAIRS = 200
MAX_CE_RE_RANKER_BACTH_SIZE = 100
MAX_CE_RE_RANKER_BACTH_REQUEST_DELAY = 5

Performance (benchmarked)

Embeddings: 20×100 tokens ≈ 200ms; 100×400 tokens ≈ 700ms
Reranker: 100 docs × 300 tokens ≲ 300ms
Throughput: 100 req/sec, 6000 req/min (observed)

Deployment

git clone https://github.com/railtelai/embedhub.git
cd embedhub
pip install -r requirements.txt
python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
controllers		controllers
enums		enums
implementations		implementations
models		models
services		services
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast Embed & Rerank Service — RAG-ready

Minimal integration flow

API (examples)

Embeddings

Rerank

Environment (example `.env`)

Performance (benchmarked)

Deployment

About

Uh oh!

Languages

License

afriddev/EmbeRankis

Folders and files

Latest commit

History

Repository files navigation

Fast Embed & Rerank Service — RAG-ready

Minimal integration flow

API (examples)

Embeddings

Rerank

Environment (example .env)

Performance (benchmarked)

Deployment

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages

Environment (example `.env`)