ViT-MonoDepth Monocular Depth Estimation using ViT For more details, please refer to the original paper discussing the model: Vision Transformers for Dense Prediction: https://arxiv.org/abs/2103.13413