Autor: Roger Camara
Convector is a tiny, practical toolkit to turn .csv datasets into newline‑delimited JSON (output.jsonl) with 384‑dim sentence embeddings and the original row as payload. It pairs with a simple importer to load the file into a local Qdrant vector DB (Docker).
-
convector.py – reads your CSV, auto‑detects columns, builds one text per row, generates 384‑dim embeddings, and writes
output.jsonl:{"id":"<uuid>", "text":"<row-as-text>", "vector":[...384 floats...], "payload":{...original row...}} -
qdrantimport.py – asks for
output.jsonl, lists Qdrant collections, and imports in batches with a progress bar.
We use a free embedding model (
paraphrase-multilingual-MiniLM-L12-v2) that outputs 384 dimensions. If you switch to another provider (e.g., OpenAI), you can use larger vectors—just make sure your collection size matches.
-
Python 3.9+
-
Install deps:
pip install -r requirements.txt
-
Put your
.csvnext toconvector.py. -
Run:
python convector.py
-
Paste/drag your CSV path and confirm the detected columns.
-
You’ll get
output.jsonlin the current folder.
docker run -p 6333:6333 --name qdrant --rm qdrant/qdrantcurl -X PUT "http://localhost:6333/collections/my_collection" -H "Content-Type: application/json" -d '{"vectors": {"size": 384, "distance": "Cosine"}}'python qdrantimport.py- Enter
output.jsonlpath. - Press Enter to keep default Qdrant URL (
http://localhost:6333). - Select
my_collection. - Watch the progress bar until ✅ Done.
curl -X POST "http://localhost:6333/collections/my_collection/points/search" -H "Content-Type: application/json" -d '{
"vector": [0.1, 0.2, ... 384 floats ...],
"limit": 3
}'curl -X POST "http://localhost:6333/collections/my_collection/points/scroll" -H "Content-Type: application/json" -d '{
"filter": {
"must": [
{"key": "payload.column_name", "match": {"value": "some_value"}}
]
},
"limit": 3
}'- Output is always
output.jsonlin the current folder. - IDs are deterministic (UUID5) for reproducible imports.
- Free model = 384 dimensions. Change model & collection size together.