Georgetown University Team for DISRPT 2025 Shared Task

🚀 Try our demo on Hugging Face Spaces:
👉 DeDisco Demo

Setup

We use conda for maintaining a python Dev environment and requirements.txt for cataloguing the dependencies

Create (or activate) the environment.

conda create -n disrpt python==3.10
conda activate disrpt

Install dependencies Dependencies can be installed with

python -m pip install -r requirements.txt

Preparing Task Data

Original data processing: Clone the shared task repo (or add as a submodule), navigate to sharedtask2025/util, and run python process_underscore.py. Then copy the output data folder into this repo as data/.
Augmented data preparation: Copy the existing augmented_data/ directory into data/ to consolidate all datasets.

Run

This project supports multi-GPU training and single-GPU evaluation via the --mode flag.

Training

To train the model using multiple GPUs (e.g., 4 GPUs):

torchrun --nproc_per_node=4 decoder_w_aug.py --mode train --checkpoint_path output/

--mode train: Specifies training mode.
--nproc_per_node=4: Number of GPUs to use.
--checkpoint_path: Directory to save model checkpoints.

Single-GPU Training

To train the model on a single GPU, you can run the script directly with python:

python decoder_w_aug.py --mode train --checkpoint_path output/

Training Configuration

Our experiments were conducted with an effective batch size of 64. This was achieved using 4 GPUs, a per_device_batch_size of 1, and gradient_accumulation_steps set to 16 (4 GPUs * 1 batch/GPU * 16 steps = 64).

For single-GPU training, set gradient_accumulation_steps to 64 in order to match the effective batch size used in our experiments.

Evaluation

To evaluate the model (usually on a single GPU):

python decoder_w_aug.py --mode eval --checkpoint_path output/checkpoint-3827 --res_path res/

--mode eval: Specifies evaluation mode.
--checkpoint_path: Directory where the checkpoint is stored, from which the model will be loaded.
- As training is configured to save only the final model, there will be a single checkpoint folder. This path should point directly to that folder (e.g., output/checkpoint-3827).
--res_path: Directory to save the prediction results.

To evaluate directly our final model:

python decoder_w_aug.py --mode eval --checkpoint_path JuNymphea/Georgetown-qwen3-4B-finetuned-for-disrpt2025 --res_path res/

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
augmented_data		augmented_data
short_qwen		short_qwen
.gitignore		.gitignore
LICENSE		LICENSE
README.MD		README.MD
data_augmentation.py		data_augmentation.py
decoder_w_aug.py		decoder_w_aug.py
disrptdata.py		disrptdata.py
features.py		features.py
mapping_disrpt25.json		mapping_disrpt25.json
requirements.txt		requirements.txt
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Georgetown University Team for DISRPT 2025 Shared Task

Setup

Preparing Task Data

Run

Training

Single-GPU Training

Training Configuration

Evaluation

Model Checkpoints

About

Uh oh!

Releases

Contributors 2

Uh oh!

Languages

License

gucorpling/disrpt25-task

Folders and files

Latest commit

History

Repository files navigation

Georgetown University Team for DISRPT 2025 Shared Task

Setup

Preparing Task Data

Run

Training

Single-GPU Training

Training Configuration

Evaluation

Model Checkpoints

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 2

Uh oh!

Languages