ASL-Interpreter

We want to build a model to interpret continuous American Sign Language (ASL) signing into English text. For this use case, we conducted experiments for fine-tuning large language vision models, specifically:

LLaVA-NeXT-Video
Video-LLaVA

Dataset

We are using the How2Sign dataset which is ASL video footage aligned with English sentences. It includes RGB videos, green-screen frontal and side views, and 3D keypoints (hand, body, face). We focused on RGB frontal-view video data for fine-tuning to manage computational constraints.

How2Sign Dataset

Project Structure

data: Contains the cleaned CSV file used as the source dataset.
data_profiling: Includes the code for data cleaning.
llava-next-video: Scripts for fine-tuning the LLaVA-NeXT-Video model on the How2Sign dataset, along with quantitative analysis of the trained model.
video-llava: Scripts for fine-tuning the Video-LLaVA model on the How2Sign dataset, as well as inference scripts for the trained model.

Installing Dependencies

pip install -r requirements.txt

Running LLaVA-NeXT-Video

1. Perform Fine-Tuning

Navigate to the huggingface_trainer directory within llava-next-video and execute the following command:

cd llava-next-video/huggingface_trainer
sbatch train.sh

We used slurm jobs to trigger the training jobs.

Outputs:

A logs folder will be created to store training logs.
An output directory will be generated to store checkpoints from training.
A generated_texts.csv file will be created for validation purposes.

`generated_texts.csv` Columns:

id: Incremental ID for each data item.
video_id: Unique identifier for the video clip, also present in valid_clips.csv.
generated: The text generated by the model for the specific clip.
true: The expected text for the specific clip.
epoch: The epoch at which the evaluation occurred.

2. Generate Validation Scores

Run the evaluation script to calculate validation scores:

cd llava-next-video/huggingface_trainer
python llava_next_video_eval.py

Outputs:

A validation_scores.csv file will be generated containing the following metrics after every epoch:
- ROUGE-1
- ROUGE-2
- ROUGE-L
- BLEU

Running Video-LLaVA

1. Perform Fine-Tuning

Navigate to the video-llava directory and execute the following command:

cd video-llava
sbatch train.sh

2. Perform Inference

Inference for the Video-LLaVA model can be performed using the Jupyter notebook located at:

video-llava/inference.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ASL-Interpreter

Dataset

Project Structure

Installing Dependencies

Running LLaVA-NeXT-Video

1. Perform Fine-Tuning

Outputs:

`generated_texts.csv` Columns:

2. Generate Validation Scores

Outputs:

Running Video-LLaVA

1. Perform Fine-Tuning

2. Perform Inference

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
data		data
data_profiling		data_profiling
llava-next-video		llava-next-video
video-llava		video-llava
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

rishavroy97/ASL-Interpreter

Folders and files

Latest commit

History

Repository files navigation

ASL-Interpreter

Dataset

Project Structure

Installing Dependencies

Running LLaVA-NeXT-Video

1. Perform Fine-Tuning

Outputs:

generated_texts.csv Columns:

2. Generate Validation Scores

Outputs:

Running Video-LLaVA

1. Perform Fine-Tuning

2. Perform Inference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`generated_texts.csv` Columns:

Packages