-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Thanks, everyone, for sharing your great work!
May I know the exact command for VQA and grounding benchmarking?
More specifically, the prompts provided in your Gradio demo are about grounding caption, e.g., describe the image (with grounding).
However, what should be the prompt if I want to evaluate on RefCOCO where I need to generate a bbox or a mask, given an input phrase?
Additionally, for VQA benchmarks such as VQAv2 and GQA, the model doesn't follow concise instructions.
For instance: Q: What is the color of the man's shirt? Answer concisely using one word or phrase.
A: The color of the man's shirt is blue.
However, I am expecting a concise answer, e.g., blue.
Therefore, it would be highly appreciated if you could help us figure out the correct prompts that make the model work in both cases.
Thanks in advance!