Related papers: Can $5^{\rm th}$ Generation Local Training Methods Support Client Sampling? Yes!

Can $5^{\rm th}$ Generation Local Training Methods Support Client Sampling? Yes!

URL: http://arxiv.org/abs/2212.14370v1
Date: Thu, 29 Dec 2022 16:51:46 GMT
Title: Can $5^{\rm th}$ Generation Local Training Methods Support Client Sampling? Yes!
Authors: Micha{\l} Grudzie\'n, Grigory Malinovsky, Peter Richt\'arik
Abstract summary: FedAvg algorithm is based on three components: client sampling (CS), data sampling (DS) and local training (LT) We propose a new generation of LT methods based on new algorithmic and theoretical foundations. We show that LT leads to provable communication acceleration for arbitrarily heterogeneous data, thus jump-starting the $5rm th$ generation of LT methods.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The celebrated FedAvg algorithm of McMahan et al. (2017) is based on three components: client sampling (CS), data sampling (DS) and local training (LT). While the first two are reasonably well understood, the third component, whose role is to reduce the number of communication rounds needed to train the model, resisted all attempts at a satisfactory theoretical explanation. Malinovsky et al. (2022) identified four distinct generations of LT methods based on the quality of the provided theoretical communication complexity guarantees. Despite a lot of progress in this area, none of the existing works were able to show that it is theoretically better to employ multiple local gradient-type steps (i.e., to engage in LT) than to rely on a single local gradient-type step only in the important heterogeneous data regime. In a recent breakthrough embodied in their ProxSkip method and its theoretical analysis, Mishchenko et al. (2022) showed that LT indeed leads to provable communication acceleration for arbitrarily heterogeneous data, thus jump-starting the $5^{\rm th}$ generation of LT methods. However, while these latest generation LT methods are compatible with DS, none of them support CS. We resolve this open problem in the affirmative. In order to do so, we had to base our algorithmic development on new algorithmic and theoretical foundations.

Related papers

Revisiting Weighted Strategy for Non-stationary Parametric Bandits and MDPs [56.246783503873225]
This paper revisits the weighted strategy for non-stationary parametric bandits.<n>We propose a simpler weight-based algorithm that is as efficient as window/restart-based algorithms.<n>Our framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2026-01-03T04:50:21Z)
Nudging the Boundaries of LLM Reasoning [77.26972440427285]
Current online reinforcement learning algorithms cannot learn from problems that are "unsolvable" to the model.<n>We propose NuRL, a "nudging" method that aims to push the upper bound of LLM reasoning using self-generated hints.<n>NuRL achieves consistent improvements across 6 benchmarks and 3 models, while remaining complementary to test-time scaling.
arXiv Detail & Related papers (2025-09-30T02:01:40Z)
Deep Reinforcement Learning with Gradient Eligibility Traces [25.47053572017618]
We propose three gradient-based methods to achieve fast and stable off-policy learning in deep reinforcement learning.<n>We provide a forward-view formulation compatible with experience replay and a backward-view formulation compatible with streaming algorithms.<n>We evaluate the proposed algorithms and show that they outperform both PPO and StreamQ in MuJoCo and MinAtar environments.
arXiv Detail & Related papers (2025-07-12T00:12:05Z)
How to Train Your LLM Web Agent: A Statistical Diagnosis [102.04125085041473]
We present the first statistically grounded study on compute allocation for LLM web-agent post-training.<n>Our approach uses a two-stage pipeline, training a Llama 3.1 8B student to imitate a Llama 3.3 70B teacher via supervised fine-tuning (SFT) and on-policy reinforcement learning.<n>Our results show that combining SFT with on-policy RL consistently outperforms either approach alone on both WorkArena and MiniWob++.
arXiv Detail & Related papers (2025-07-05T17:12:33Z)
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning [65.2421542320293]
Reasoning abilities are crucial components of general intelligence. Recent advances by proprietary companies, such as o-series models of OpenAI, have made remarkable progress on reasoning tasks. This paper proposes a new RL framework, termed OREAL, to pursue the performance limit that can be achieved through textbfOutcome textbfREwtextbfArd-based reinforcement textbfLearning for mathematical reasoning tasks.
arXiv Detail & Related papers (2025-02-10T18:57:29Z)
Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs) Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally. Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z)
Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning [15.46907000938726]
We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL) We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC. We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (textiti.e., $N$-chain), a video game, and a real-world problem in energy systems.
arXiv Detail & Related papers (2024-04-16T17:01:38Z)
Efficient Diffusion Policies for Offline Reinforcement Learning [85.73757789282212]
Diffsuion-QL significantly boosts the performance of offline RL by representing a policy with a diffusion model. We propose efficient diffusion policy (EDP) to overcome these two challenges. EDP constructs actions from corrupted ones at training to avoid running the sampling chain.
arXiv Detail & Related papers (2023-05-31T17:55:21Z)
Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits. We propose a refined analysis framework, which produces a simpler weight-based algorithm. Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z)
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers [31.021091635737776]
We propose a novel random and layerwise token dropping method (random-LTD) for transformer models. random-LTD achieves considerable speedups and comparable accuracy as the standard training baseline. Our results show that random-LTD can save about 33.3% theoretical compute cost and 25.6% wall-clock training time.
arXiv Detail & Related papers (2022-11-17T23:14:58Z)
GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity [3.222802562733787]
We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication. We prove that our modified method, for which the name GradSkip, converges linearly under the same assumptions.
arXiv Detail & Related papers (2022-10-28T20:59:06Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
Variance Reduced ProxSkip: Algorithm, Theory and Application to Federated Learning [0.0]
We study distributed optimization methods based on the em local training (LT) paradigm. We show that our methods can be substantially faster in terms of the em total training cost than the state-of-the-art method ProxSkip.
arXiv Detail & Related papers (2022-07-09T20:59:30Z)
Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation. We develop a new adversarial learning based method, which is simple and efficient to apply. We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z)
Multi-Agent Off-Policy TD Learning: Finite-Time Analysis with Near-Optimal Sample Complexity and Communication Complexity [13.100926925535578]
We develop two decentralized TD with correction (TDC) algorithms for multi-agent off-policy TD learning. Our algorithms preserve full privacy of the actions, policies and rewards of the agents, and adopt mini-batch sampling to reduce the sampling variance and communication frequency.
arXiv Detail & Related papers (2021-03-24T12:48:08Z)
Faster Secure Data Mining via Distributed Homomorphic Encryption [108.77460689459247]
Homomorphic Encryption (HE) is receiving more and more attention recently for its capability to do computations over the encrypted field. We propose a novel general distributed HE-based data mining framework towards one step of solving the scaling problem. We verify the efficiency and effectiveness of our new framework by testing over various data mining algorithms and benchmark data-sets.
arXiv Detail & Related papers (2020-06-17T18:14:30Z)
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator) We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$. We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.