Two-stage Multi-teacher Knowledge Distillation (TMKD) for Web Question Answering System

This repository contains the implementation and experimentation of the Two-stage Multi-teacher Knowledge Distillation (TMKD) method for enhancing the efficiency and performance of web-based Question Answering (QA) systems. The TMKD method compresses large, complex models like BERT into smaller, more efficient student models, ensuring minimal information loss and significantly improving inference speed.

Overview

The TMKD method employs a two-stage process where multiple teacher models guide the distillation of knowledge into the student model. This approach ensures comprehensive learning and robust performance, making it ideal for practical web applications requiring real-time responses.

Key Features

Two-stage Knowledge Distillation: Involves an initial stage with multiple teachers followed by fine-tuning to ensure high-quality knowledge transfer.
Multi-teacher Approach: Utilizes multiple teacher models to capture a wider range of knowledge, resulting in a more accurate and reliable student model.
Enhanced Efficiency: The student model is significantly smaller and faster, making it suitable for deployment in resource-constrained environments.
High Performance: Achieves results comparable to the original teacher models, outperforming baseline models in various metrics.

Project Structure

Project
│
├── Code
│   ├── Many_to_One_RTE_to_QNLI.ipynb
│   ├── MNLI_Experiments.ipynb
│   ├── QNLI_Experiments.ipynb
│   ├── RTE_Experiments.ipynb
│   └── SNLI_Experiments.ipynb
│
├── Research Papers
│   ├── Paper.pdf
├── LICENSE
├── PPT.pdf
├── Requirements.txt
├── Report.pdf
└── README.md

.ipynb Files

MNLI_Experiments.ipynb: Contains initial experimentation on the student model with one-to-one (1-o-1) and many-to-many (m-o-m) experimentation on the MNLI dataset.
SNLI_Experiments.ipynb: Details 1-o-1 and m-o-m experimentation on the SNLI dataset.
QNLI_Experiments.ipynb: Documents 1-o-1 and m-o-m experimentation on the QNLI dataset.
RTE_Experiments.ipynb: Includes 1-o-1 and m-o-m experimentation on the RTE dataset.
Many_to_One_RTE_to_QNLI.ipynb: Implements many-to-one (m-o-1) distillation on the RTE dataset, followed by fine-tuning on the QNLI dataset.

Checkpoints

Create a folder named checkpoints and organize subfolders to store various model checkpoints as referenced in the code.

Stage 1 student model trained on RTE dataset: Download here
Stage 2 finetuned student model on QNLI dataset: Download here

Getting Started

Prerequisites

Python 3.7 or later
PyTorch
Transformers (Hugging Face)
Other dependencies listed in Requirements.txt

Installation

Clone this repository:

git clone git@github.com:sharmamht19/Model-Compression-with-Knowledge-Distillation.git
cd Model-Compression-with-Knowledge-Distillation

Install the required dependencies:
```
pip install -r Requirements.txt
```

Usage

Prepare your dataset and pre-trained models as specified.
Use the provided notebooks (.ipynb files) to replicate the experiments and train your student model using the TMKD method.
Evaluate the performance of the student model using the provided evaluation scripts.

Example Workflow

Experimentation: Use MNLI_Experiments.ipynb, SNLI_Experiments.ipynb, QNLI_Experiments.ipynb, and RTE_Experiments.ipynb for one-to-one and many-to-many knowledge distillation experiments.
Many-to-One Distillation and Fine-Tuning: Follow Many_to_One_RTE_to_QNLI.ipynb for implementing many-to-one distillation on the RTE dataset and fine-tuning on the QNLI dataset.
Checkpoint Management: Store model checkpoints in the checkpoints directory as specified in the notebooks.

Contribution

We welcome contributions to enhance the functionality and performance of TMKD. Feel free to open issues and submit pull requests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

We would like to thank the developers of the BERT model and the Hugging Face Transformers library for their invaluable contributions to the field of Natural Language Processing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Two-stage Multi-teacher Knowledge Distillation (TMKD) for Web Question Answering System

Overview

Key Features

Project Structure

.ipynb Files

Checkpoints

Getting Started

Prerequisites

Installation

Usage

Example Workflow

Contribution

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Code		Code
Research Papers		Research Papers
.DS_Store		.DS_Store
LICENSE		LICENSE
PPT.pdf		PPT.pdf
README.md		README.md
Report.pdf		Report.pdf
Requirements.txt		Requirements.txt

License

sharmamht19/Model-Compression-with-Knowledge-Distillation

Folders and files

Latest commit

History

Repository files navigation

Two-stage Multi-teacher Knowledge Distillation (TMKD) for Web Question Answering System

Overview

Key Features

Project Structure

.ipynb Files

Checkpoints

Getting Started

Prerequisites

Installation

Usage

Example Workflow

Contribution

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages