Picture for Pan Zhang

Pan Zhang

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Add code
Jun 06, 2024
Viaarxiv icon

Bootstrap3D: Improving 3D Content Creation with Synthetic Data

Add code
May 31, 2024
Viaarxiv icon

Streaming Long Video Understanding with Large Language Models

May 25, 2024
Viaarxiv icon

ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing

May 18, 2024
Viaarxiv icon

Unified Scene Representation and Reconstruction for 3D Large Language Models

Add code
Apr 19, 2024
Viaarxiv icon

Are We on the Right Way for Evaluating Large Vision-Language Models?

Add code
Apr 09, 2024
Figure 1 for Are We on the Right Way for Evaluating Large Vision-Language Models?
Figure 2 for Are We on the Right Way for Evaluating Large Vision-Language Models?
Figure 3 for Are We on the Right Way for Evaluating Large Vision-Language Models?
Figure 4 for Are We on the Right Way for Evaluating Large Vision-Language Models?
Viaarxiv icon

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Add code
Apr 09, 2024
Viaarxiv icon

InternLM2 Technical Report

Add code
Mar 26, 2024
Figure 1 for InternLM2 Technical Report
Figure 2 for InternLM2 Technical Report
Figure 3 for InternLM2 Technical Report
Figure 4 for InternLM2 Technical Report
Viaarxiv icon

Long-CLIP: Unlocking the Long-Text Capability of CLIP

Add code
Mar 22, 2024
Viaarxiv icon

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition

Add code
Mar 20, 2024
Figure 1 for RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
Figure 2 for RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
Figure 3 for RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
Figure 4 for RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
Viaarxiv icon