-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
What are the benchmarks for evaluation? E.g.., in Tab.3 and Tab.4, what are the used test datasets? In paper, I noticed that:
The low-level visual abilities of MLLMs after low-level visual instruction tuning are quantitatively evaluated in three tasks defined by [57]
However, it is a little ambiguous for me. Did you use new data to create three similar tasks define in [57], or directly use the same Q-Bench in [57] as the test dataset?
Thanks
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels