Alert button
Picture for Bohan Zhai

Bohan Zhai

Alert button

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Add code
Bookmark button
Alert button
Mar 03, 2024
Haogeng Liu, Quanzeng You, Xiaotian Han, Yiqi Wang, Bohan Zhai, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

Figure 1 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 2 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 3 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 4 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Viaarxiv icon

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Add code
Bookmark button
Alert button
Jan 18, 2024
Yiqi Wang, Wentao Chen, Xiaotian Han, Xudong Lin, Haiteng Zhao, Yongfei Liu, Bohan Zhai, Jianbo Yuan, Quanzeng You, Hongxia Yang

Viaarxiv icon

COCO is "ALL'' You Need for Visual Instruction Fine-tuning

Add code
Bookmark button
Alert button
Jan 17, 2024
Xiaotian Han, Yiqi Wang, Bohan Zhai, Quanzeng You, Hongxia Yang

Viaarxiv icon

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

Add code
Bookmark button
Alert button
Dec 04, 2023
Xiaotian Han, Quanzeng You, Yongfei Liu, Wentao Chen, Huangjie Zheng, Khalil Mrini, Xudong Lin, Yiqi Wang, Bohan Zhai, Jianbo Yuan, Heng Wang, Hongxia Yang

Viaarxiv icon

HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption

Add code
Bookmark button
Alert button
Oct 03, 2023
Bohan Zhai, Shijia Yang, Xiangchen Zhao, Chenfeng Xu, Sheng Shen, Dongdi Zhao, Kurt Keutzer, Manling Li, Tan Yan, Xiangjun Fan

Figure 1 for HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption
Figure 2 for HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption
Figure 3 for HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption
Figure 4 for HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption
Viaarxiv icon

Multitask Vision-Language Prompt Tuning

Add code
Bookmark button
Alert button
Dec 05, 2022
Sheng Shen, Shijia Yang, Tianjun Zhang, Bohan Zhai, Joseph E. Gonzalez, Kurt Keutzer, Trevor Darrell

Figure 1 for Multitask Vision-Language Prompt Tuning
Figure 2 for Multitask Vision-Language Prompt Tuning
Figure 3 for Multitask Vision-Language Prompt Tuning
Figure 4 for Multitask Vision-Language Prompt Tuning
Viaarxiv icon

Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets

Add code
Bookmark button
Alert button
Jun 08, 2021
Chenfeng Xu, Shijia Yang, Bohan Zhai, Bichen Wu, Xiangyu Yue, Wei Zhan, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka

Figure 1 for Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets
Figure 2 for Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets
Figure 3 for Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets
Figure 4 for Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets
Viaarxiv icon

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition

Add code
Bookmark button
Alert button
Mar 31, 2021
Sehoon Kim, Amir Gholami, Zhewei Yao, Anirudda Nrusimha, Bohan Zhai, Tianren Gao, Michael W. Mahoney, Kurt Keutzer

Figure 1 for Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition
Figure 2 for Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition
Figure 3 for Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition
Figure 4 for Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition
Viaarxiv icon

You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module

Add code
Bookmark button
Alert button
Mar 24, 2021
Chenfeng Xu, Bohan Zhai, Bichen Wu, Tian Li, Wei Zhan, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka

Figure 1 for You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module
Figure 2 for You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module
Figure 3 for You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module
Figure 4 for You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module
Viaarxiv icon