Related papers: Coordinating Distributed Example Orders for Provably Accelerated Training

Coordinating Distributed Example Orders for Provably Accelerated Training

URL: http://arxiv.org/abs/2302.00845v5
Date: Thu, 21 Dec 2023 19:41:57 GMT
Title: Coordinating Distributed Example Orders for Provably Accelerated Training
Authors: A. Feder Cooper, Wentao Guo, Khiem Pham, Tiancheng Yuan, Charlie F. Ruan, Yucheng Lu, Christopher De Sa
Abstract summary: We propose Coordinated Distributed GraB (CD-GraB) to translate the benefits of permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed RR on a variety of benchmark tasks.
Score: 39.05759866984658
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: while it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed RR on a variety of benchmark tasks.

Related papers

Learnable Chernoff Baselines for Inference-Time Alignment [64.81256817158851]
We introduce Learnable Chernoff Baselines as a method for efficiently and approximately sampling from exponentially tilted kernels.<n>We establish total-variation guarantees to the ideal aligned model, and demonstrate in both continuous and discrete diffusion settings that LCB sampling closely matches ideal rejection sampling.
arXiv Detail & Related papers (2026-02-08T00:09:40Z)
Random Controlled Differential Equations [1.2107297090229683]
We introduce a training-efficient framework for time-series learning that combines random features with controlled differential equations (CDEs)<n>Only a linear readout layer is trained, resulting in fast, scalable models with strong inductive bias.<n>We evaluate both models across a range of time-series benchmarks, demonstrating competitive or state-of-the-art performance.
arXiv Detail & Related papers (2025-12-29T18:25:10Z)
Fine-Grained Bias Exploration and Mitigation for Group-Robust Classification [11.525201208566925]
Bias Exploration via Overfitting (BEO) captures each distribution in greater detail by modeling it as a mixture of latent groups.<n>We introduce a fine-grained variant of CCDB, termed FG-CCDB, which performs more precise distribution matching and balancing within each group.<n>Our method performs on par with bias-supervised approaches on binary classification tasks and significantly outperforms them in highly biased multi-class scenarios.
arXiv Detail & Related papers (2025-05-11T04:01:34Z)
Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer [9.153197757307762]
probabilistic diffusion model (DM) is a powerful framework for visual generation. How to efficiently align the foundation DM is a crucial task. We propose the Recursive Likelihood Ratio (RLR), a zeroth-order informed fine-tuning paradigm for DM.
arXiv Detail & Related papers (2025-02-02T03:00:26Z)
MindFlayer SGD: Efficient Parallel SGD in the Presence of Heterogeneous and Random Worker Compute Times [49.1574468325115]
We investigate the problem of minimizing the expectation of smooth non functions in a setting with multiple parallel workers that are able to compute optimal gradients.<n>A challenge in this context is the presence of arbitrarily heterogeneous and distributed compute times.<n>We introduce MindFlayer SGD, a novel parallel SGD method specifically designed to handle this gap.
arXiv Detail & Related papers (2024-10-05T21:11:32Z)
Generalized Schrödinger Bridge Matching [54.171931505066]
Generalized Schr"odinger Bridge (GSB) problem setup is prevalent in many scientific areas both within and without machine learning. We propose Generalized Schr"odinger Bridge Matching (GSBM), a new matching algorithm inspired by recent advances. We show that such a generalization can be cast as solving conditional optimal control, for which variational approximations can be used.
arXiv Detail & Related papers (2023-10-03T17:42:11Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)
Deep Momentum Multi-Marginal Schr\"odinger Bridge [41.27274841596343]
We present a novel framework that learns the smooth measure-valued algorithm for systems that satisfy position marginal constraints across time. Our algorithm outperforms baselines significantly, as evidenced by experiments for synthetic datasets and a real-world single-cell RNA dataset sequence.
arXiv Detail & Related papers (2023-03-03T07:24:38Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods [0.0]
We quantify the effect of order on convergence speed, obtaining convergence bounds based on the chosen sequence of permutations. We develop a greedy algorithm for choosing good orders during training, achieving superior performance (by more than 14 percent in accuracy) over RR.
arXiv Detail & Related papers (2022-02-03T20:38:42Z)
Relieving Long-tailed Instance Segmentation via Pairwise Class Balance [85.53585498649252]
Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes. It causes severe biases of the head classes (with majority samples) against the tailed ones. We propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences.
arXiv Detail & Related papers (2022-01-08T07:48:36Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
Slow and Stale Gradients Can Win the Race [39.750046808758526]
Distributed Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers) Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect the convergence error. We present a novel theoretical characterization of the speedup offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime.
arXiv Detail & Related papers (2020-03-23T23:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.