Skip to content

Contrastive Learning for Long Document Transformers

Notifications You must be signed in to change notification settings

siddhantshingi/CoLLT

Repository files navigation

forthebadge made-with-python forthebadge

forthebadge

CoLLT: Contrastive Learning for Long-document Transformers

CoLLT is a contrastive learning framework for training BERT and its variants to classify long input sequence.

How to run the file

  1. Download the zip folder for the code
  2. Run the main.py notebook to execute the experiments

About the code

The code is written in a well designed modular way so that implementing new contrastive loss, augmentation technique or data encoder is easy. The code is divided into 4 main files:

  1. augmenters.py: contains code for different augmentation techniques which are needed for view construction
  2. models.py: contains code for different data encoders
  3. contrast_models.py: contains pre processing step (like sample positive and negative samples, etc) before applying contrastive loss
  4. losses.py: contains code for barlow twin's loss. We have a main.py file which handles the training of end-to-end model pipeline.

Other files:

  1. Bert_baseline.ipynb: contains code to run the BERT baseline model
  2. Data_filter.ipynb: contains code for data preprocessing
  3. baselines.py: conatines baseline models
  4. baselines.ipynb: used to run baseline.py
  5. data.pickle, data_val.pickle, data_test.pickle: contains train, validation and test data
  6. data_visualization.ipynb: contains data visualization tools and techniques

About

Contrastive Learning for Long Document Transformers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •