Alert button
Picture for Can Huang

Can Huang

Alert button

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Add code
Bookmark button
Alert button
Apr 19, 2024
Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao Liu, Yuan Xie, Xiang Bai, Can Huang

Figure 1 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 2 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 3 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Figure 4 for TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Viaarxiv icon

Elysium: Exploring Object-level Perception in Videos via MLLM

Add code
Bookmark button
Alert button
Mar 29, 2024
Han Wang, Yanjie Wang, Yongjie Ye, Yuxiang Nie, Can Huang

Figure 1 for Elysium: Exploring Object-level Perception in Videos via MLLM
Figure 2 for Elysium: Exploring Object-level Perception in Videos via MLLM
Figure 3 for Elysium: Exploring Object-level Perception in Videos via MLLM
Figure 4 for Elysium: Exploring Object-level Perception in Videos via MLLM
Viaarxiv icon

PURPLE: Making a Large Language Model a Better SQL Writer

Add code
Bookmark button
Alert button
Mar 29, 2024
Tonghui Ren, Yuankai Fan, Zhenying He, Ren Huang, Jiaqi Dai, Can Huang, Yinan Jing, Kai Zhang, Yifan Yang, X. Sean Wang

Figure 1 for PURPLE: Making a Large Language Model a Better SQL Writer
Figure 2 for PURPLE: Making a Large Language Model a Better SQL Writer
Figure 3 for PURPLE: Making a Large Language Model a Better SQL Writer
Figure 4 for PURPLE: Making a Large Language Model a Better SQL Writer
Viaarxiv icon

Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation

Add code
Bookmark button
Alert button
Feb 27, 2024
Yuankai Fan, Zhenying He, Tonghui Ren, Can Huang, Yinan Jing, Kai Zhang, X. Sean Wang

Figure 1 for Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation
Figure 2 for Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation
Figure 3 for Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation
Figure 4 for Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation
Viaarxiv icon

PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition

Add code
Bookmark button
Alert button
Feb 15, 2024
Jinghui Lu, Ziwei Yang, Yanjie Wang, Xuejing Liu, Brian Mac Namee, Can Huang

Figure 1 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Figure 2 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Figure 3 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Figure 4 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Viaarxiv icon

GloTSFormer: Global Video Text Spotting Transformer

Add code
Bookmark button
Alert button
Jan 08, 2024
Han Wang, Yanjie Wang, Yang Li, Can Huang

Viaarxiv icon

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding

Add code
Bookmark button
Alert button
Nov 30, 2023
Hao Feng, Qi Liu, Hao Liu, Wengang Zhou, Houqiang Li, Can Huang

Figure 1 for DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Figure 2 for DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Figure 3 for DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Figure 4 for DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Viaarxiv icon

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Add code
Bookmark button
Alert button
Nov 23, 2023
Zhen Zhao, Jingqun Tang, Chunhui Lin, Binghong Wu, Hao Liu, Zhizhong Zhang, Xin Tan, Can Huang, Yuan Xie

Figure 1 for Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Figure 2 for Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Figure 3 for Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Figure 4 for Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Viaarxiv icon

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

Add code
Bookmark button
Alert button
Sep 02, 2023
Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wengang Zhou, Houqiang Li, Can Huang

Figure 1 for UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding
Figure 2 for UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding
Figure 3 for UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding
Figure 4 for UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding
Viaarxiv icon

ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

Add code
Bookmark button
Alert button
Aug 20, 2023
Mingxin Huang, Jiaxin Zhang, Dezhi Peng, Hao Lu, Can Huang, Yuliang Liu, Xiang Bai, Lianwen Jin

Figure 1 for ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
Figure 2 for ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
Figure 3 for ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
Figure 4 for ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
Viaarxiv icon