Skip to content

Clarification on Preprocessing for Images of Different Resolutions #118

@Neronjust2017

Description

@Neronjust2017

Hi. thanks for open-sourcing the amazing Perception Encoder! Could you clarify two points about image preprocessing, especially referencing Table 33's description ("trained with dynamic tiling for different image sizes and aspect ratio; up to 4 image tiles of the encoder’s native resolution + a thumbnail"):

  1. When is the input resized to fixed native sizes (e.g., 336px for L-scale, 448px for G-scale)?
  2. When is dynamic tiling applied instead?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions