CoLLT is a contrastive learning framework for training BERT and its variants to classify long input sequence.
- Download the zip folder for the code
- Run the main.py notebook to execute the experiments
The code is written in a well designed modular way so that implementing new contrastive loss, augmentation technique or data encoder is easy. The code is divided into 4 main files:
- augmenters.py: contains code for different augmentation techniques which are needed for view construction
- models.py: contains code for different data encoders
- contrast_models.py: contains pre processing step (like sample positive and negative samples, etc) before applying contrastive loss
- losses.py: contains code for barlow twin's loss. We have a main.py file which handles the training of end-to-end model pipeline.
- Bert_baseline.ipynb: contains code to run the BERT baseline model
- Data_filter.ipynb: contains code for data preprocessing
- baselines.py: conatines baseline models
- baselines.ipynb: used to run baseline.py
- data.pickle, data_val.pickle, data_test.pickle: contains train, validation and test data
- data_visualization.ipynb: contains data visualization tools and techniques