Exact prompt for RefSeg dataset Reproducibility

Thanks, everyone, for sharing your great work!

May I know the exact command for VQA and grounding benchmarking?
More specifically, the prompts provided in your Gradio demo are about grounding caption, e.g., describe the image (with grounding).
However, what should be the prompt if I want to evaluate on RefCOCO where I need to generate a bbox or a mask, given an input phrase?

Additionally, for VQA benchmarks such as VQAv2 and GQA, the model doesn't follow concise instructions.
For instance: Q: What is the color of the man's shirt? Answer concisely using one word or phrase.
A: The color of the man's shirt is blue. 
However, I am expecting a concise answer, e.g., blue.

Therefore, it would be highly appreciated if you could help us figure out the correct prompts that make the model work in both cases.

Thanks in advance! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Exact prompt for RefSeg dataset Reproducibility #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Exact prompt for RefSeg dataset Reproducibility #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions