This work shows the existence of high-certainty hallucinations in LLMs, where models confidently generate incorrect answers despite possessing the correct knowledge.

- Clone the repository:
git clone git@github.com:technion-cs-nlp/Trust_me_Im_wrong.git- Create and activate the conda environment:
conda env create -f environment.yml
conda activate TrustMeImWrongTo create a knowledge dataset of a given model_name using either natural_question/triviaqa datasets, run the following command and replace model_name with the desired model and dataset_name with either natural_question or triviaqa:
python run_all_steps.py --create_knowledge_dataset True --model_name model_name --path_to_datasets datasets/ --dataset_name natural_question/triviaqaThe saved files will be in the datasets folder. Containing three json files, one with the ending knowledge_dataset.json, which are all the examples the model has the knowledge to answer correctly, one with the ending non_knowledge_dataset.json, which are all the examples the model does not have the knowledge to answer correctly, and one with the ending else_dataset.json, which are all the examples the model have partial knowledge to answer correctly.
To create uncertainty calculations for the knowledge dataset, run the following command and replace method_k_positive with the desired method (e.g., prompt_7):
python run_all_steps.py --uncertainty_calculation True --model_name model_name --path_to_datasets datasets/ --dataset_name natural_question/triviaqa --method_k_positive prompt_7At the end of this step you will have the following files in the results folder: model_name/dataset_name/method_k_positive/factuality_stats.json model_name/dataset_name/method_k_positive/hallucinations_stats.json
In the factuality_stats.json are all the examples the model generated the correct answer under the given method_k_positive setting. In the hallucinations_stats.json are all the examples the model generated a wrong answer under the given method_k_positive setting.
- prob: the probability of the first answer token, generated: the generated text under the given method_k_positive,
- true_answer: the true answer,prob_diff: the difference between the probability of the most likely and second likely next token,
- semantic_entropy: the semantic entropy of the generated text, mean_entropy: the predictive entropy,
- most_likely_tokens: top 5 most likely tokens, temp_generations: are the generations using temperature of 1,
- semantic_entropy_temp_0.5: the semantic entropy of the generated text using temperature of 0.5 instead of 1,
- mean_entropy_temp_0.5: the predictive entropy using temperature of 0.5 instead of 1,
- temp_generations_temp_0.5: are the generations using temperature of 0.5 instead of 1,
- prompt: the prompt used to generate the text.
To generate the graphs and tables for the results of the uncertainty calculations, run the following command:
python run_all_steps.py --run_results True --path_to_datasets datasets/ If you use CHOKE in your research, please cite our paper:
@article{simhi2025trust,
title={Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs},
author={Simhi, Adi and Itzhak, Itay and Barez, Fazl and Stanovsky, Gabriel and Belinkov, Yonatan},
journal={arXiv preprint arXiv:2502.12964},
year={2025}
}The repository is organized as follows:
- README.md: This README file.
- run_all_steps.py: Main script to run knowledge dataset creation, uncertainty calculation, and results generation.
- environment.yml: Conda environment file for setting up dependencies.
- semantic_entropy: Module for calculating semantic entropy (Kuhn et al., 2023).`
- calc_semantic_entropy.py: Script to calculate semantic entropy.
- uncertainty_calculations.py: Module for uncertainty calculation methods.
- knowledge_detection.py: Module for knowledge detection step.
- results_sub.py: Module for generating results (graphs and tables).
- datasets/: Directory to store datasets.
- results/: Directory to store results.