What is the Evaluation Dataset

What are the benchmarks for evaluation? E.g.., in Tab.3 and Tab.4, what are the used test datasets? In paper, I noticed that:

> The low-level visual abilities of MLLMs after low-level visual instruction tuning are quantitatively evaluated in three tasks defined by [57]

However, it is a little ambiguous for me. Did you use new data to create three similar tasks define in [57], or directly use the same Q-Bench in [57] as the test dataset?

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the Evaluation Dataset #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What is the Evaluation Dataset #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions