Alert button

LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism

Apr 15, 2024
Bingyang Wu, Shengyu Liu, Yinmin Zhong, Peng Sun, Xuanzhe Liu, Xin Jin

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: