Alert button
Picture for Vijay Kumar BG

Vijay Kumar BG

Alert button

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement

Add code
Bookmark button
Alert button
Apr 06, 2024
Zaid Khan, Vijay Kumar BG, Samuel Schulter, Yun Fu, Manmohan Chandraker

Viaarxiv icon

Exploring Question Decomposition for Zero-Shot VQA

Add code
Bookmark button
Alert button
Oct 25, 2023
Zaid Khan, Vijay Kumar BG, Samuel Schulter, Manmohan Chandraker, Yun Fu

Viaarxiv icon

Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!

Add code
Bookmark button
Alert button
Jun 06, 2023
Zaid Khan, Vijay Kumar BG, Samuel Schulter, Xiang Yu, Yun Fu, Manmohan Chandraker

Figure 1 for Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Figure 2 for Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Figure 3 for Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Figure 4 for Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Viaarxiv icon

Single-Stream Multi-Level Alignment for Vision-Language Pretraining

Add code
Bookmark button
Alert button
Mar 30, 2022
Zaid Khan, Vijay Kumar BG, Xiang Yu, Samuel Schulter, Manmohan Chandraker, Yun Fu

Figure 1 for Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Figure 2 for Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Figure 3 for Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Figure 4 for Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Viaarxiv icon

Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention

Add code
Bookmark button
Alert button
Nov 23, 2020
Varnith Chordia, Vijay Kumar BG

Figure 1 for Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention
Figure 2 for Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention
Figure 3 for Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention
Figure 4 for Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention
Viaarxiv icon

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

Add code
Bookmark button
Alert button
Jul 29, 2016
Ravi Garg, Vijay Kumar BG, Gustavo Carneiro, Ian Reid

Figure 1 for Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
Figure 2 for Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
Figure 3 for Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
Figure 4 for Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
Viaarxiv icon