Alert button

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction

Apr 12, 2024
Haoran Qiu, Weichao Mao, Archit Patke, Shengkun Cui, Saurabh Jha, Chen Wang, Hubertus Franke, Zbigniew T. Kalbarczyk, Tamer Başar, Ravishankar K. Iyer

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: