-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Great work! I'd like to know how to apply ChannelViT to image segmentation tasks, because its single-channel mapping results in a spatial distribution for each channel. For example, given an input of shape (B, 3, 64, 64) with a patch size of 16, it produces a mapping of shape (B, 388); whereas with a typical ViT, the shape would be (B, 8*8), so you can simply feed the feature layer into a decoder. However, if the sequence produced by ChannelViT involves channel tokens, what should I do? Looking forward to your reply !!!
Metadata
Metadata
Assignees
Labels
No labels