- Kseniia Rydannykh
- Manuela Posso Baena
- Anh Minh Do
- Huu Dang (Tony) Hoang
- Maria Paula Gonzalez Vasquez
This repository contains the solution developed during the AIMS Hackathon – AI Against Modern Slavery.
It was created by Team Code4Freedom Australia for Challenge 2: AI Model Optimisation & Explainability, and includes the results, code, and experiments produced as part of the hackathon.
Under the Modern Slavery Acts, large companies are required to submit annual modern slavery statements.
These statements are reviewed by expert organisations, and each company receives a rating/grade.
Accuracy in grading is critical:
- A lower grade can lead to negative consequences for the company, including investor pressure, reputational risks, and regulatory scrutiny.
- Reviewing organisations must grade without errors to maintain trust.
- To ensure reliability, each statement is currently reviewed by three independent reviewers.
At present, only statements from large companies are being assessed. However, to scale the process to more organisations, AI-assisted solutions are required.
Existing platforms such as AIMS.au and AIMSCheck provide strong technical foundations — sentence-level annotations, cross-jurisdictional evaluations, and token-level explainability — but two challenges remain:
- Models sometimes hallucinate explanations, making predictions that do not follow human-like logic.
- Reviewing organisations need visibility into how confident the model is in its predictions.
To address these issues, our project:
- Designs a rationale-based pipeline that makes model reasoning more transparent and aligned with human logic.
- Introduces an analysis of confidence scores, giving reviewers an additional layer of trust in automated assessments.
To improve explainability and trust in the model, we designed and implemented a solution with two main components, rationale-based pipeline and confidence analysis pipeline, which we describe in detail below.
The baseline models for modern slavery statements sometimes produced hallucinated explanations, highlighting tokens that did not match human logic. To address this, we designed a two-head model architecture To address this, we designed a two-head transformer model: one head predicts the sentence category, while the second highlights the words that justify the prediction.
The rationale head was supervised with pseudo-rationales (automatically generated keywords) to teach the model not only what category to assign, but also why.
This allows predictions to be explainable and aligned with human reviewers’ reasoning.
- Pseudo-rationale generation – extract compact rationales for supervision.
Notebook: 1-create-pseudo-rationales.ipynb
Rationales were generated using a pseudo-rationale pipeline:
- Extracted category-specific keywords with TF-IDF,
- Kept only high-coverage words across examples,
- Limited to ≤5 tokens per category and removed duplicates.
During the project, we explored several approaches for rationale generation, including LLM-based methods and TF-IDF. The latter proved to be faster and more cost-efficient for processing a large dataset while still providing strong results. A detailed description and outcomes of the experiments can be found here generate-rationales-eperiments
- Two-head model training – fine-tune microsoft/deberta-v3-xsmall with two heads.
Notebook: 2-train-main-classifier.ipynb
Here we trained a two-head transformer model for multi-label classification of modern slavery statements.
- Backbone: DeBERTa, fine-tuned (2 unfrozen layers + classifier)
- Head 1 – Categories: predicts sentence category.
- Head 2 – Rationales: highlights words supporting predictions. During training, the model optimizes a combined loss:
where α controls the trade-off between accurate category predictions and meaningful rationale extraction.
To ensure computational feasibility, the training dataset was downsampled to 100,000 sentences using stratified sampling across all categories. The model was trained for only 3 epochs due to resource limitations, but it showed promising results with steadily improving training performance.
The notebook below shows the model’s predictions on test data and their outputs.
Notebook: 3-classifier-model-inference.ipynb
Below you can see examples where the model not only predicts categories but also provides highlighted rationales directly in the text.
This approach makes predictions more interpretable by:
- Highlighting the exact words in a sentence that support the decision,
- Providing human-readable rationales aligned with categories,
- Improving trust and transparency of the model,
Notebook : 4-explore-confidence.ipynb
Provide reviewers with a measure of how confident the model is in its predictions. The confidence pipeline is designed to provide reviewers with a measure of how certain the model is about its predictions.
Instead of returning only a binary decision (e.g., “Policies covered” vs. “not covered”), the model also outputs a confidence score between 0 and 1.
These scores allow reviewers to distinguish between high-certainty outputs (trustworthy, can be automated) and low-certainty outputs (flagged for manual review).
- Prediction: The classifier (based on DistilBERT/AimsDistilModel) outputs raw logits for 11 compliance labels.
- Probability Conversion: Logits are transformed into probabilities using sigmoid/softmax functions.
- Calibration: Confidence scores are calibrated (e.g., Platt scaling or isotonic regression) to improve reliability.
- Aggregation: Scores are aggregated at sentence, section, and document levels.
- Export: Results are output in JSON format, including both the prediction and the associated confidence score.
Distribution Example: Confidence scores across sections (e.g., Remediation) show how certain the model is about missing vs. covered reporting elements.
Sentence-level Example:

In practice:
- High-confidence (≥0.8) → reviewers can trust the output.
- Low-confidence (<0.5) → flagged for manual review.
This pipeline improves trust and transparency, helping reviewers prioritize their time, reduce errors, and scale assessments without compromising reliability.
This project introduces two complementary pipelines:
- Rationale-based pipeline – improves explainability and aligns predictions with human logic.
- Confidence analysis pipeline – provides transparency on model reliability. Together, these approaches form a foundation for scaling automated grading of modern slavery statements to a larger set of organisations while maintaining trust.
Youtube pitch video: https://youtu.be/KoE29AXD-js
Presentation: Presentation-CODE4FREEDOM-AUS.pdf
Location: /datasets
Location: /project-code
Location: /docs
Presentation-CODE4FREEDOM-AUS.pdf
This project builds on the open research of Project AIMS (AI against Modern Slavery) by Mila and QUT.
GitHub repository: ai4h_aims-au.
- Describe here the resources used in developing your solution (e.g. GPUs, etc).
This repository and its accompanying models, datasets, metrics, dashboards, and comparative analyses are provided strictly for research and demonstration purposes.
Any comparisons, rankings, or assessments of companies or organizations are exploratory in nature. They may be affected by incomplete data, modeling limitations, or methodological choices. These results must not be used to make factual, legal, or reputational claims about any entity without independent expert review and validation.
Do not use this repository’s contents to make public statements or claims about specific companies, organizations, or individuals.
By submitting this solution to the AIMS Hackathon, our team acknowledges and agrees to abide by the Event’s Terms and Conditions.
