Related papers: One-Shot Averaging for Distributed TD($λ$) Under Markov Sampling

One-Shot Averaging for Distributed TD($λ$) Under Markov Sampling

URL: http://arxiv.org/abs/2403.08896v2
Date: Sat, 1 Jun 2024 02:10:42 GMT
Title: One-Shot Averaging for Distributed TD($λ$) Under Markov Sampling
Authors: Haoxing Tian, Ioannis Ch. Paschalidis, Alex Olshevsky,
Abstract summary: We show that we can achieve a linear speedup for TD($lambda$), a family of popular methods for policy evaluation, in the sense that $N$ agents can evaluate a policy $N$ times faster. Notably, this speedup is achieved by one shot averaging,'' a procedure where the agents run TD($lambda$) with Markov sampling independently and only average their results after the final step.
Score: 18.437456273777407
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a distributed setup for reinforcement learning, where each agent has a copy of the same Markov Decision Process but transitions are sampled from the corresponding Markov chain independently by each agent. We show that in this setting, we can achieve a linear speedup for TD($\lambda$), a family of popular methods for policy evaluation, in the sense that $N$ agents can evaluate a policy $N$ times faster provided the target accuracy is small enough. Notably, this speedup is achieved by ``one shot averaging,'' a procedure where the agents run TD($\lambda$) with Markov sampling independently and only average their results after the final step. This significantly reduces the amount of communication required to achieve a linear speedup relative to previous work.

Related papers

Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum [62.691095807959215]
We establish an optimal sample complexity of $O(-2)$ for obtaining an $$-optimal global policy using a single-timescale actor-critic (AC) algorithm.<n>These mechanisms are compatible with existing deep learning architectures and require only minor modifications, without compromising practical applicability.
arXiv Detail & Related papers (2026-02-02T00:35:42Z)
Parameter-Free Federated TD Learning with Markov Noise in Heterogeneous Environments [3.4165401459803335]
Federated learning (FL) can dramatically speed up reinforcement learning by distributing exploration and training across multiple agents.<n>Existing results on TD learning achieving this rate require the algorithm to depend on unknown problem parameters.<n>We propose a two-timescale Federated Temporal Difference (FTD) learning with Polyak-Ruppert averaging.
arXiv Detail & Related papers (2025-10-08T18:36:30Z)
Online Fair Division for Personalized $2$-Value Instances [51.278096593080456]
We study an online fair division setting, where goods arrive one at a time and there is a fixed set of $n$ agents.<n>Once a good appears, the value each agent has for it is revealed and it must be allocated immediately and irrevocably to one of the agents.<n>We show how to obtain worst case guarantees with respect to well-known fairness notions.
arXiv Detail & Related papers (2025-05-28T09:48:16Z)
Cooperative Multi-Agent Constrained Stochastic Linear Bandits [2.099922236065961]
A network of $N$ agents communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold $tau$. We propose a safe distributed upper confidence bound algorithm, so called textitMA-OPLB, and establish a high probability bound on its $T$-round regret. We show that our regret bound is of order $ mathcalOleft(fracdtau-c_0fraclog(NT)2sqrtNsqrtTlog (1/|lambda|)
arXiv Detail & Related papers (2024-10-22T19:34:53Z)
DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server. To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z)
Accelerating Convergence of Score-Based Diffusion Models, Provably [44.11766377798812]
Score-based diffusion models often suffer from low sampling speed due to extensive function evaluations needed during the sampling phase. We design novel training-free algorithms to accelerate popular deterministic (i.e., DDIM) and (i.e., DDPM) samplers. Our theory accommodates $ell$-accurate score estimates, and does not require log-concavity or smoothness on the target distribution.
arXiv Detail & Related papers (2024-03-06T17:02:39Z)
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions. To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits. In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z)
Federated Learning Using Variance Reduced Stochastic Gradient for Probabilistically Activated Agents [0.0]
This paper proposes an algorithm for Federated Learning (FL) with a two-layer structure that achieves both variance reduction and a faster convergence rate to an optimal solution in the setting where each agent has an arbitrary probability of selection in each iteration.
arXiv Detail & Related papers (2022-10-25T22:04:49Z)
Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server. $alpha$-fraction of agents are adversarial and can report arbitrary fake information. We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
Denoising Diffusion Implicit Models [117.03720513930335]
We present denoising diffusion implicit models (DDIMs) for iterative implicit probabilistic models with the same training procedure as DDPMs. DDIMs can produce high quality samples $10 times$ to $50 times$ faster in terms of wall-clock time compared to DDPMs.
arXiv Detail & Related papers (2020-10-06T06:15:51Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
Loop Estimator for Discounted Values in Markov Reward Processes [19.011189395046014]
A policy evaluation step estimates the value of states with samples from a Markov reward process induced by following a Markov policy. We propose a simple and efficient estimator called loop estimator that exploits the regenerative structure of Markov reward processes. In preliminary numerical experiments, the loop estimator outperforms model-free methods, such as TD(k) and is competitive with the model-based estimator.
arXiv Detail & Related papers (2020-02-15T01:42:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.