This repository contains the implementation and experimentation of the Two-stage Multi-teacher Knowledge Distillation (TMKD) method for enhancing the efficiency and performance of web-based Question Answering (QA) systems. The TMKD method compresses large, complex models like BERT into smaller, more efficient student models, ensuring minimal information loss and significantly improving inference speed.
The TMKD method employs a two-stage process where multiple teacher models guide the distillation of knowledge into the student model. This approach ensures comprehensive learning and robust performance, making it ideal for practical web applications requiring real-time responses.
- Two-stage Knowledge Distillation: Involves an initial stage with multiple teachers followed by fine-tuning to ensure high-quality knowledge transfer.
- Multi-teacher Approach: Utilizes multiple teacher models to capture a wider range of knowledge, resulting in a more accurate and reliable student model.
- Enhanced Efficiency: The student model is significantly smaller and faster, making it suitable for deployment in resource-constrained environments.
- High Performance: Achieves results comparable to the original teacher models, outperforming baseline models in various metrics.
Project
│
├── Code
│ ├── Many_to_One_RTE_to_QNLI.ipynb
│ ├── MNLI_Experiments.ipynb
│ ├── QNLI_Experiments.ipynb
│ ├── RTE_Experiments.ipynb
│ └── SNLI_Experiments.ipynb
│
├── Research Papers
│ ├── Paper.pdf
├── LICENSE
├── PPT.pdf
├── Requirements.txt
├── Report.pdf
└── README.md
- MNLI_Experiments.ipynb: Contains initial experimentation on the student model with one-to-one (1-o-1) and many-to-many (m-o-m) experimentation on the MNLI dataset.
- SNLI_Experiments.ipynb: Details 1-o-1 and m-o-m experimentation on the SNLI dataset.
- QNLI_Experiments.ipynb: Documents 1-o-1 and m-o-m experimentation on the QNLI dataset.
- RTE_Experiments.ipynb: Includes 1-o-1 and m-o-m experimentation on the RTE dataset.
- Many_to_One_RTE_to_QNLI.ipynb: Implements many-to-one (m-o-1) distillation on the RTE dataset, followed by fine-tuning on the QNLI dataset.
Create a folder named checkpoints and organize subfolders to store various model checkpoints as referenced in the code.
- Stage 1 student model trained on RTE dataset: Download here
- Stage 2 finetuned student model on QNLI dataset: Download here
- Python 3.7 or later
- PyTorch
- Transformers (Hugging Face)
- Other dependencies listed in
Requirements.txt
-
Clone this repository:
git clone git@github.com:sharmamht19/Model-Compression-with-Knowledge-Distillation.git cd Model-Compression-with-Knowledge-Distillation -
Install the required dependencies:
pip install -r Requirements.txt
- Prepare your dataset and pre-trained models as specified.
- Use the provided notebooks (
.ipynbfiles) to replicate the experiments and train your student model using the TMKD method. - Evaluate the performance of the student model using the provided evaluation scripts.
- Experimentation: Use
MNLI_Experiments.ipynb,SNLI_Experiments.ipynb,QNLI_Experiments.ipynb, andRTE_Experiments.ipynbfor one-to-one and many-to-many knowledge distillation experiments. - Many-to-One Distillation and Fine-Tuning: Follow
Many_to_One_RTE_to_QNLI.ipynbfor implementing many-to-one distillation on the RTE dataset and fine-tuning on the QNLI dataset. - Checkpoint Management: Store model checkpoints in the
checkpointsdirectory as specified in the notebooks.
We welcome contributions to enhance the functionality and performance of TMKD. Feel free to open issues and submit pull requests.
This project is licensed under the MIT License. See the LICENSE file for details.
We would like to thank the developers of the BERT model and the Hugging Face Transformers library for their invaluable contributions to the field of Natural Language Processing.