You can follow the steps below to run the system locally from scratch, or directly visit our pre-deployed search and visualization system at: 🔗 http://27.106.99.149:9300/
Follow the steps below to run the system from scratch:
First, download the qwen3-8b-emb and place it under the llm_model/ directory.
llm_model/
└── qwen3-8b-emb/
├── config.json
├── model.safetensors
├── tokenizer.json
└── ...
Next, run the embedding module to generate entity embeddings from the knowledge triples.
- Load triples from
data/triples.jsonl - Extract entity objects
- Compute embeddings using the Qwen model
- Save embeddings to disk for reuse
python emb/QwenTripletEmbeddingSearch.pyllm_model/
└── qwen_embeddings.pkl
This file contains the cached entity embeddings and will be automatically loaded in later runs.
Once embeddings are ready, run the main pipeline.
python main.pypython main.py \
--query "climate change" \
--top_k 100 \
--rec_topk 5 \
--api_url https://api.openai.com/v1 \
--model_id gpt-4o \
--api_key YOUR_API_KEY- Perform semantic entity matching using embeddings
- Retrieve candidate datasets via multi-hop diffusion
- Build and visualize the dataset–entity graph
- Generate dataset recommendations using the LLM
Check the generated results and interactive visualizations.
- Dataset rankings and statistics.
graph.html
Open graph.html in your browser to explore dataset–entity relationships.
If qwen_embeddings.pkl already exists, embeddings will be loaded automatically.
- The triples data changes.
- A different embedding model is used.