Skip to content

Request for Instructions on Downloading Evaluation Datasets and Audio/Visual File Structure #7

@Alextien40816291

Description

@Alextien40816291

Hi, thank you for your great work on GRAM!

I’m currently trying to reproduce the evaluation results reported in the paper using the provided pretrained models. However, I encountered some issues regarding the datasets used for evaluation. I would appreciate it if you could provide some clarification:

  1. Datasets for Evaluation
    According to the paper and repository, GRAM is evaluated on datasets such as:

MSR-VTT

DiDeMo

ActivityNet

VATEX

Could you kindly share:

Where can I download the processed version of these datasets (or any instructions to process the raw versions)?

Are there any specific preprocessing scripts you used to prepare the evaluation data compatible with your code?

Is there a recommended directory structure for these datasets?

  1. Audio Files Requirement
    In the config JSON files, I see that both vision_path and audio_path are required for evaluation.

Are the audio features provided as part of the official datasets? If not, could you provide instructions or scripts to extract them?

For example, how should the audio_path directory be structured for the MSR-VTT or DiDeMo dataset?

What audio format and feature type (e.g., wav2vec2, VGGish, etc.) does the model expect?

  1. Recommended Folder Structure
    To avoid misconfiguration, could you also share an example of a valid folder structure for one of the datasets (e.g., MSR-VTT), showing where the video, audio, and metadata (e.g., captions or annotations) should be located?

Looking forward to your guidance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions