Clarification on Preprocessing for Images of Different Resolutions

Hi. thanks for open-sourcing the amazing Perception Encoder! Could you clarify two points about image preprocessing, especially referencing Table 33's description ("trained with dynamic tiling for different image sizes and aspect ratio; up to 4 image tiles of the encoder’s native resolution + a thumbnail"):
1. When is the input resized to fixed native sizes (e.g., 336px for L-scale, 448px for G-scale)?
2. When is dynamic tiling applied instead?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on Preprocessing for Images of Different Resolutions #118

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarification on Preprocessing for Images of Different Resolutions #118

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions