Currently in: src/ml_filter/analysis/utils.py:
for file_path in input_file_paths:
# Extract relevant metadata from the filename
# TODO: This is a bit fragile, consider using a more robust method to extract metadata
prompt, prompt_lang, annotator = file_path.stem.split("__")[1:4]
We should instead fix the prompt_lang meta data and use e.g. the first data point for setting the above fields.
Example sample:
... "meta_information": {"prompt_name": "pii_content_filter", "prompt_lang": "deu", "model_name": "google/gemma-3-27b-it",