Project CLAIRE: A Benchmark Study on the Trade-off Between Factual Retention and Linguistic Coherence in Continual Learning

This repository contains the code, methodology, and results of Project CLAIRE, an independent research initiative that empirically demonstrates a fundamental and previously under-documented trade-off between memory and coherence in continually trained Large Language Models.

The Core Finding: A Surprising Victor

Our central discovery is that for long-term, multi-task continual learning, structural preservation methods (PEFT/LoRA) are significantly more effective at memory retention than even the most sophisticated active rehearsal strategies.

In a grueling five-task "gauntlet" benchmark, the PEFT-Regularization Only method—a strategy with no memory replay whatsoever—emerged as the undisputed champion of memory retention.

Key Quantitative Insights

Our calibrated, multi-stage benchmark produced several non-obvious, quantitative results:

Structural Preservation Dominates: PEFT-Regularization Only was the most effective method, retaining 23.70% of its factual knowledge after being trained on four subsequent new tasks. This significantly outperformed all replay-based full fine-tuning methods.
Random Replay is Actively Harmful: Standard Random Replay was the worst-performing strategy, retaining only 15.00% of facts and causing a catastrophic ~25% drop in linguistic coherence compared to other methods, proving that naive rehearsal can be more damaging than no rehearsal at all.
Intelligent Curation is Effective but Insufficient: Our novel Interference-Aware Replay (CLAIRE), which intelligently selects at-risk memories, systematically outperformed Random Replay (19.20% vs. 15.00%). However, this proves that even a superior replay strategy cannot, by itself, overcome the fundamental instability of full fine-tuning.
The Coherence Collapse: All methods involving full fine-tuning suffered from severe degradation of linguistic coherence over the full sequence. Only the PEFT-based methods (PEFT-Only and CLAIRE+) were able to maintain a high degree of linguistic stability, proving that the PEFT architecture is the primary guardian of a model's ability to generate sane, structured language.

Methodology in Brief

Model: meta-llama/Llama-3.1-8B-Instruct
Benchmark: A five-task sequential learning "gauntlet" using a controlled, synthetic corpus of 500 unique facts.
Novel Components:
1. Interference-Aware Teacher: An AI curator that uses semantic embeddings to identify and prioritize the most "at-risk" memories for rehearsal.
2. LLM-as-a-Judge Evaluator: A calibrated AI judge used to score each model's memory of all past tasks on both Factual Correctness (semantic equivalence) and Linguistic Coherence.

Conclusion: A New Theory of the Case

We propose a new framework for understanding continual learning: it is a battle to solve two distinct problems, Factual Forgetting and Coherence Collapse.

Our results conclusively show that PEFT/LoRA is the most powerful tool for solving Coherence Collapse and serves as the most robust baseline for Factual Forgetting. Active rehearsal strategies like our novel CLAIRE method, while demonstrably better than naive approaches, should be viewed as a secondary tool.

Future research should not focus on replay as a primary solution, but rather on finding the optimal, synergistic balance between the structural armor of PEFT and the targeted, active reminders of an intelligent rehearsal system.

How to Reproduce

Clone this repository: git clone https://github.com/Yash3561/Project_CLAIRE.git
Create a Python virtual environment and run pip install -r requirements.txt.
Create a .env file and add your HF_TOKEN='your_token_here'.
For running on an HPC with a Slurm scheduler, copy the provided template: cp run_gauntlet.slurm.template run_gauntlet.slurm
Edit run_gauntlet.slurm and replace the placeholder values (YOUR_ACCOUNT_NAME, etc.) with your specific HPC configuration.
Submit the job to the scheduler: sbatch run_gauntlet.slurm.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
CLAIRE_Experiment_Notebook.ipynb		CLAIRE_Experiment_Notebook.ipynb
README.md		README.md
benchmark_results.png		benchmark_results.png
calibrated_benchmark_results.png		calibrated_benchmark_results.png
golden_buffer_interference_aware.json		golden_buffer_interference_aware.json
grand_benchmark_results.png		grand_benchmark_results.png
perfected_gauntlet.py		perfected_gauntlet.py
requirements.txt		requirements.txt
run_gauntlet.slurm.template		run_gauntlet.slurm.template
synthetic_facts_corpus.json		synthetic_facts_corpus.json
the_gauntlet_results.png		the_gauntlet_results.png
the_perfected_gauntlet_results.png		the_perfected_gauntlet_results.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project CLAIRE: A Benchmark Study on the Trade-off Between Factual Retention and Linguistic Coherence in Continual Learning

The Core Finding: A Surprising Victor

Key Quantitative Insights

Methodology in Brief

Conclusion: A New Theory of the Case

How to Reproduce

About

Uh oh!

Releases

Packages

Languages

Yash3561/Project_CLAIRE

Folders and files

Latest commit

History

Repository files navigation

Project CLAIRE: A Benchmark Study on the Trade-off Between Factual Retention and Linguistic Coherence in Continual Learning

The Core Finding: A Surprising Victor

Key Quantitative Insights

Methodology in Brief

Conclusion: A New Theory of the Case

How to Reproduce

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages