CAMeL-2: Cultural Appropriateness Measure Set for LMs

This repository contains the Arabic-English parallel contexts and cultural entities of the CAMeL-2 dataset for measuring cultural biases in language models.

For more details, see the accompanying paper:
"On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena", NAACL 2025

Entities

The folder entities contains the parallel Arabic-English entities for 7 different entity types, annotated for broad association with Arab or Western cultures.

QA Contexts

The folder qa-contexts provides long contexts with implicit reference to the [MASK] which supports evaluation on extractive QA.

All QA contexts are parallel (provided in Arabic and English versions).

Culturally-Contextualized Contexts

The folder camelco-contexts provides two culturally-contextualized contexts from the previous CAMeL benchmark, where only Arab entities are appropriate [MASK] fillings

We provide two versions of the contexts:

a version for masked-lms where the [MASK] is placed anywhere in the context
a version for causal-lms where we rewrite sentences for the cultural context to appear behind the [MASK]

The masked-lms contexts are annotated for sentiment (positive, negative, neutral) to support fairness evaluation on sentiment analysis.

All culturally-grounded contexts are parallel (provided in Arabic and English versions).

Fine-tuned NER Models

The fine-tuned Arabic NER models based on XLMR-Large have been uploaded to huggingface:

Names and Author names: https://huggingface.co/tareknaous/xlmr-large-ar-ner-per

Locations: https://huggingface.co/tareknaous/xlmr-large-ar-ner-loc

Food and Beverage: https://huggingface.co/tareknaous/xlmr-large-ar-ner-food

Sports Clubs: https://huggingface.co/tareknaous/xlmr-large-ar-ner-sportsclubs

Citation

@article{naous2025origin,
  title={On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena},
  author={Naous, Tarek and Xu, Wei},
  journal={arXiv preprint arXiv:2501.04662},
  year={2025}
}

Contact

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
cultural-contexts		cultural-contexts
entities		entities
qa-contexts		qa-contexts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAMeL-2: Cultural Appropriateness Measure Set for LMs

Entities

QA Contexts

Culturally-Contextualized Contexts

Fine-tuned NER Models

Citation

Contact

About

Uh oh!

Releases

Packages

License

tareknaous/camel2

Folders and files

Latest commit

History

Repository files navigation

CAMeL-2: Cultural Appropriateness Measure Set for LMs

Entities

QA Contexts

Culturally-Contextualized Contexts

Fine-tuned NER Models

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages