02456_Deep_Learning_Project

Automated Key Point Description for Vision Transformers using Vision-Language Models A recent work has shown that features extracted by Vision Transformers (ViTs) trained using self-supervised learning can perform unsupervised key point matching between two images with high precision (https://arxiv.org/abs/2112.05814). However, since the key points are identified in an unsupervised manner, human evaluation is necessary to describe the key point that are discovered. The goal of this project is to automatically create textual descriptions of the key points using recent advances in vision-language modeling (https://proceedings.mlr.press/v139/radford21a/radford21a.pdf). The project will be focused on fine-grained classification and has direct links to ongoing research on explainability.

Data Source : https://www.kaggle.com/datasets/wenewone/cub2002011

Important paper links : https://proceedings.neurips.cc/paper_files/paper/2019/file/adf7ee2dcf142b0e11888e72b43fcb75-Paper.pdf https://openaccess.thecvf.com/content/CVPR2023/papers/Nauta_PIP-Net_Patch-Based_Intuitive_Prototypes_for_Interpretable_Image_Classification_CVPR_2023_paper.pdf https://arxiv.org/abs/2105.02968 https://dino-vit-features.github.io/ https://arxiv.org/abs/2103.00020

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
src		src
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

02456_Deep_Learning_Project

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

chensy618/02456_Deep_Learning_Project

Folders and files

Latest commit

History

Repository files navigation

02456_Deep_Learning_Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages