Skip to content

aashrith-madasu/Reproduce-Paper-TemAdapter-VideoQA

Repository files navigation

Reproducing the results of paper Tem-adapter (Video Question-Answering)

This repository explores the reproduction and improvement of the Tem-adapter architecture for Video Question Answering (VideoQA) using the SUTD-TrafficQA dataset. The project involves replication of results using released checkpoints, training from scratch, and extending the architecture with a custom cross-attention layer.


Setup


Results

Replication with Released Checkpoint

Source Validation Accuracy
Original (paper) 46.00%
Reproduced (ckpt) 46.00%

✔️ Exact match with the published results using the official checkpoint.


Training from Scratch

Metric Value
Sum loss 0.127
Avg loss 0.34
CE loss 33.28
Recon loss 0.0067
Average Accuracy 98.20%
Validation Accuracy 45.37%

⚠️ Minor drop (~0.63%) from original likely due to smaller batch size and different GPU.

About

Reproducing results of the research paper "Tem-Adapter" for Video Question-Answering task using PyTorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages