This repository contains the artifacts used towards my Bachelor's Degree Thesis, presented June 16th, 2025.
TL;DR: Check this file run-huggingface-bbqdataset-qascorer.ipynb (If you want more details, keep reading).
Published under CC BY-NC-ND at UOC's public repository O2 here:
Estevan Estevan, José Antonio. 2025. "Benchmarking Large Language Models toward reasoning fairness and unanticipated bias". Universitat Oberta de Catalunya (UOC). https://hdl.handle.net/10609/153037
- Docs · https://azure.github.io/PyRIT
- Repo · https://github.com/Azure/PyRIT
- Paper · https://arxiv.org/abs/2410.02828
- Repo · https://github.com/nyu-mll/BBQ
- Paper · https://arxiv.org/abs/2110.08193
-
bbq_dataset.py · PyRIT valid dataset class representing BBQ data. This class imports the dataset from the original BBQ data files.
-
pyrit_tuning.py · Inherited PyRIT class to modify how PyRIT
QuestionAnswerScorergenerates question prompts.
-
run-aifoundry-bbqdataset-qascorer.ipynb · Notebook for runing questions in the cloud against models in Azure AI foundry.
-
run-huggingface-bbqdataset-qascorer.ipynb · Notebook for runing questions agains local models through Hugging Face Inference API.
-
run-local-result-file-post-processing.ipynb · This notebook includes a few pieces of code to process the answers into the CSV files used for data analysis. This is an optional component and only includes plain Python code using Pandas.
-
data/bbq · A copy of the original BBQ data files utilized for this work.
-
data/final · These are the files I produced during this work.
These are the models used for this experiment:
| Model | Developer | Size (params) | Training data (tokens) | Release date | License |
|---|---|---|---|---|---|
| Phi 3 mini 4k instruct | Microsoft | 3.8B | 3.3T | Dec. 2024 | MIT |
| GPT-4o mini | Open AI | ~8B | N/A | July 2024 | N/A |
| Gemma 3 4b it | 675M | 4T | Mar. 2025 | Gemma | |
| SmolLM 360M instruct | Hugging Face | 360M | 600B | July 2024 | Apache |