Alert button

Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization

Nov 28, 2023
Jinhao Li, Shiyao Li, Jiaming Xu, Shan Huang, Yaoxiu Lian, Jun Liu, Yu Wang, Guohao Dai

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: