Toppings: CPU-Assisted, Rank-Aware Adapter Serving for LLM Inference
Published in 2025 USENIX Annual Technical Conference (USENIX ATC 25), 2025
Suyi Li*, Hanfeng Lu*, Tianyuan Wu, Minchen Yu, Qizhen Weng, Xusheng Chen, Yizhou Shan, Binhang Yuan, Wei Wang (* Equal contribution).
Recommended citation: Suyi Li, Hanfeng Lu, Tianyuan Wu, Minchen Yu, Qizhen Weng, Xusheng Chen, Yizhou Shan, Binhang Yuan, and Wei Wang. "CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference." arXiv preprint arXiv:2401.11240 (2024).
Download Paper