Alert button
Picture for Zhihang Yuan

Zhihang Yuan

Alert button

A Survey on Efficient Inference for Large Language Models

Add code
Bookmark button
Alert button
Apr 22, 2024
Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang

Viaarxiv icon

PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds

Add code
Bookmark button
Alert button
Apr 11, 2024
Weisheng Xu, Sifan Zhou, Zhihang Yuan

Viaarxiv icon

LLM Inference Unveiled: Survey and Roofline Model Insights

Add code
Bookmark button
Alert button
Mar 11, 2024
Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer

Viaarxiv icon

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More

Add code
Bookmark button
Alert button
Feb 20, 2024
Yuxuan Yue, Zhihang Yuan, Haojie Duanmu, Sifan Zhou, Jianlong Wu, Liqiang Nie

Viaarxiv icon

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

Add code
Bookmark button
Alert button
Feb 13, 2024
Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Yan Yan

Viaarxiv icon

MIM4DD: Mutual Information Maximization for Dataset Distillation

Add code
Bookmark button
Alert button
Dec 27, 2023
Yuzhang Shang, Zhihang Yuan, Yan Yan

Viaarxiv icon

Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting

Add code
Bookmark button
Alert button
Dec 17, 2023
Dawei Yang, Ning He, Xing Hu, Zhihang Yuan, Jiangyong Yu, Chen Xu, Zhe Jiang

Viaarxiv icon

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

Add code
Bookmark button
Alert button
Dec 10, 2023
Zhihang Yuan, Yuzhang Shang, Yue Song, Qiang Wu, Yan Yan, Guangyu Sun

Viaarxiv icon

PB-LLM: Partially Binarized Large Language Models

Add code
Bookmark button
Alert button
Sep 29, 2023
Yuzhang Shang, Zhihang Yuan, Qiang Wu, Zhen Dong

Viaarxiv icon

Latency-aware Unified Dynamic Networks for Efficient Image Recognition

Add code
Bookmark button
Alert button
Sep 02, 2023
Yizeng Han, Zeyu Liu, Zhihang Yuan, Yifan Pu, Chaofei Wang, Shiji Song, Gao Huang

Figure 1 for Latency-aware Unified Dynamic Networks for Efficient Image Recognition
Figure 2 for Latency-aware Unified Dynamic Networks for Efficient Image Recognition
Figure 3 for Latency-aware Unified Dynamic Networks for Efficient Image Recognition
Figure 4 for Latency-aware Unified Dynamic Networks for Efficient Image Recognition
Viaarxiv icon