Abstract: Distributed training (DT) has emerged as a solution to address the growing computational resource demands of training large-scale machine learning models. To meet this need, major cloud ...