Adaptra: Straggler-Resilient Hybrid-Parallel Training with Pipeline Adaptation
Published in Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI ’26), 2025
Tianyuan Wu*, Lunxi Cao*, Hangeng Lu, Xiaoxiao Jiang, Yinghao Yu, Siran Yang, Guodong Yang, Jiamang Wang, Lin Qu, Liping Zhang, Wei Wang.
Recommended citation: Tianyuan Wu*, Lunxi Cao*, Hanfeng Lu, Xiaoxiao Jiang, Yinghao Yu, Siran Yang, Guodong Yang, Jiamang Wang, Lin Qu, Liping Zhang, and Wei Wang, "Attack of the Bubbles: Straggler-Resilient Pipeline Parallelism for Large Model Training," in the Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI ’26), Renton, WA, USA, May 2026. (*Equal contribution)
Download Paper