Multihal

Multilingual, knowledge-graph grounded benchmark for evaluating factuality and knowledge injection methods for LLMs.

Setup

Tested on Python 3.11, MacBook Pro M3 18GB. Optionally you can use a CUDA-compatible GPU for faster inference for translation and sentence-transformers.

Create a Python 3.11 virtual environment, either with venv or conda.
Install the requirements:

python -m pip install -r requirements.txt

Create an .env file in the root directory with the following content:

HF_TOKEN=<YOU_HF_TOKEN>
WANDB_API_KEY=<YOUR_WANDB_API_KEY>
OPEN_ROUTER_API_KEY=<YOUR_OPEN_ROUTER_API_KEY>

Execute the code with.

python main.py --config config/default.yaml

The default.yaml file contains the default configuration to run the whole pipeline from start to finish. You can also create your own configuration file and pass it to the script to process separate stages or datasets.

The default configuration arguments and their documentation is defined in src.utils.config.GlobalConfig.get_default_args(). The parameters specified in config/default.yaml overrides the defaults, therefore you can create your own configuration files to run different experiments.

Output is written to output/ directory with a timestamped folder.
Alternatively you can also run the code by building a Docker image:

docker build . -t multihal:latest
docker run --rm multihal:latest

# or alternatively to run the code in interactive mode
docker run --rm -it --entrypoint sh multihal:latest

Table and Figure Generation from the Original Paper

We also supply our raw results in the results/ directory. To generate the tables and figures from the original paper by running the results/generate_results.ipynb notebook. Output will be written to results/output/ directory.

Results

We perform our evaluation based on semantic similarity computed using sentence-transformers. The figure below shows comparisons of semantic similarity scores between ground-truth and predicted answers, for vanilla QA and KG-RAG (KG path labels as part of context). The results show consistent improvements in semantic similarity when using out mined KG paths.

License

This project is licensed under CC-BY-4.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
config		config
fig		fig
notebooks		notebooks
res		res
results		results
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
getData.sh		getData.sh
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multihal

Setup

Table and Figure Generation from the Original Paper

Results

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ernlavr/multihal

Folders and files

Latest commit

History

Repository files navigation

Multihal

Setup

Table and Figure Generation from the Original Paper

Results

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages