One-Shot Averaging for Distributed TD($λ$) Under Markov Sampling
- URL: http://arxiv.org/abs/2403.08896v2
- Date: Sat, 1 Jun 2024 02:10:42 GMT
- Title: One-Shot Averaging for Distributed TD($λ$) Under Markov Sampling
- Authors: Haoxing Tian, Ioannis Ch. Paschalidis, Alex Olshevsky,
- Abstract summary: We show that we can achieve a linear speedup for TD($lambda$), a family of popular methods for policy evaluation, in the sense that $N$ agents can evaluate a policy $N$ times faster.
Notably, this speedup is achieved by one shot averaging,'' a procedure where the agents run TD($lambda$) with Markov sampling independently and only average their results after the final step.
- Score: 18.437456273777407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a distributed setup for reinforcement learning, where each agent has a copy of the same Markov Decision Process but transitions are sampled from the corresponding Markov chain independently by each agent. We show that in this setting, we can achieve a linear speedup for TD($\lambda$), a family of popular methods for policy evaluation, in the sense that $N$ agents can evaluate a policy $N$ times faster provided the target accuracy is small enough. Notably, this speedup is achieved by ``one shot averaging,'' a procedure where the agents run TD($\lambda$) with Markov sampling independently and only average their results after the final step. This significantly reduces the amount of communication required to achieve a linear speedup relative to previous work.
Related papers
- Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum [62.691095807959215]
We establish an optimal sample complexity of $O(-2)$ for obtaining an $$-optimal global policy using a single-timescale actor-critic (AC) algorithm.<n>These mechanisms are compatible with existing deep learning architectures and require only minor modifications, without compromising practical applicability.
arXiv Detail & Related papers (2026-02-02T00:35:42Z) - Parameter-Free Federated TD Learning with Markov Noise in Heterogeneous Environments [3.4165401459803335]
Federated learning (FL) can dramatically speed up reinforcement learning by distributing exploration and training across multiple agents.<n>Existing results on TD learning achieving this rate require the algorithm to depend on unknown problem parameters.<n>We propose a two-timescale Federated Temporal Difference (FTD) learning with Polyak-Ruppert averaging.
arXiv Detail & Related papers (2025-10-08T18:36:30Z) - Online Fair Division for Personalized $2$-Value Instances [51.278096593080456]
We study an online fair division setting, where goods arrive one at a time and there is a fixed set of $n$ agents.<n>Once a good appears, the value each agent has for it is revealed and it must be allocated immediately and irrevocably to one of the agents.<n>We show how to obtain worst case guarantees with respect to well-known fairness notions.
arXiv Detail & Related papers (2025-05-28T09:48:16Z) - Cooperative Multi-Agent Constrained Stochastic Linear Bandits [2.099922236065961]
A network of $N$ agents communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold $tau$.
We propose a safe distributed upper confidence bound algorithm, so called textitMA-OPLB, and establish a high probability bound on its $T$-round regret.
We show that our regret bound is of order $ mathcalOleft(fracdtau-c_0fraclog(NT)2sqrtNsqrtTlog (1/|lambda|)
arXiv Detail & Related papers (2024-10-22T19:34:53Z) - DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Accelerating Convergence of Score-Based Diffusion Models, Provably [44.11766377798812]
Score-based diffusion models often suffer from low sampling speed due to extensive function evaluations needed during the sampling phase.
We design novel training-free algorithms to accelerate popular deterministic (i.e., DDIM) and (i.e., DDPM) samplers.
Our theory accommodates $ell$-accurate score estimates, and does not require log-concavity or smoothness on the target distribution.
arXiv Detail & Related papers (2024-03-06T17:02:39Z) - Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions.
To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits.
In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Federated Learning Using Variance Reduced Stochastic Gradient for
Probabilistically Activated Agents [0.0]
This paper proposes an algorithm for Federated Learning (FL) with a two-layer structure that achieves both variance reduction and a faster convergence rate to an optimal solution in the setting where each agent has an arbitrary probability of selection in each iteration.
arXiv Detail & Related papers (2022-10-25T22:04:49Z) - Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server.
$alpha$-fraction of agents are adversarial and can report arbitrary fake information.
We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Denoising Diffusion Implicit Models [117.03720513930335]
We present denoising diffusion implicit models (DDIMs) for iterative implicit probabilistic models with the same training procedure as DDPMs.
DDIMs can produce high quality samples $10 times$ to $50 times$ faster in terms of wall-clock time compared to DDPMs.
arXiv Detail & Related papers (2020-10-06T06:15:51Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Loop Estimator for Discounted Values in Markov Reward Processes [19.011189395046014]
A policy evaluation step estimates the value of states with samples from a Markov reward process induced by following a Markov policy.
We propose a simple and efficient estimator called loop estimator that exploits the regenerative structure of Markov reward processes.
In preliminary numerical experiments, the loop estimator outperforms model-free methods, such as TD(k) and is competitive with the model-based estimator.
arXiv Detail & Related papers (2020-02-15T01:42:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.