Related papers: Federated Natural Policy Gradient Methods for Multi-task Reinforcement Learning

Federated Natural Policy Gradient Methods for Multi-task Reinforcement Learning

URL: http://arxiv.org/abs/2311.00201v1
Date: Wed, 1 Nov 2023 00:15:18 GMT
Title: Federated Natural Policy Gradient Methods for Multi-task Reinforcement Learning
Authors: Tong Yang, Shicong Cen, Yuting Wei, Yuxin Chen, Yuejie Chi
Abstract summary: Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories. In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks. We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
Score: 49.65958529941962
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories. In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment. Focusing on infinite-horizon tabular Markov decision processes, the goal is to learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner, where each agent only communicates with its neighbors over some prescribed graph topology. We develop federated vanilla and entropy-regularized natural policy gradient (NPG) methods under softmax parameterization, where gradient tracking is applied to the global Q-function to mitigate the impact of imperfect information sharing. We establish non-asymptotic global convergence guarantees under exact policy evaluation, which are nearly independent of the size of the state-action space and illuminate the impacts of network size and connectivity. To the best of our knowledge, this is the first time that global convergence is established for federated multi-task RL using policy optimization. Moreover, the convergence behavior of the proposed algorithms is robust against inexactness of policy evaluation.

Related papers

Task-Aware Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [70.96345405979179]
The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction. variations in task content and complexity pose significant challenges in policy formulation. We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task.
arXiv Detail & Related papers (2024-11-02T05:49:14Z)
Towards Fast Rates for Federated and Multi-Task Reinforcement Learning [34.34798425737858]
We propose Fast-FedPG, a novel federated policy algorithm with a carefully designed bias-correction mechanism. Under a gradient-domination condition, we prove that our algorithm guarantees (i) fast linear convergence with exact gradients, and (ii) sub-linear rates that enjoy a linear speedup w.r.t. the number of agents with noisy, truncated policy gradients.
arXiv Detail & Related papers (2024-09-09T02:59:17Z)
Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning [13.908826484332282]
Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time. This paper presents a constrained formulation for multi-task RL where the goal is to maximize the average performance of the policy across tasks subject to bounds on the performance in each task.
arXiv Detail & Related papers (2024-05-03T19:43:30Z)
Finite-Time Analysis of On-Policy Heterogeneous Federated Reinforcement Learning [8.632943870358627]
Federated reinforcement learning (FRL) has emerged as a promising paradigm for reducing the sample complexity of reinforcement learning tasks. We introduce FedSARSA, a novel on-policy reinforcement learning scheme equipped with linear function approximation. We show that FedSARSA converges to a policy that is near-optimal for all agents, with the extent of near-optimality proportional to the level of heterogeneity.
arXiv Detail & Related papers (2024-01-27T02:43:45Z)
Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback. We consider the general reward setting where the reward can be defined over the whole trajectory. We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z)
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO. We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z)
Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning [25.747559058350557]
We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) that provably learns a near-globally-optimal policy using only local information.
arXiv Detail & Related papers (2022-11-30T15:58:00Z)
Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints. We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z)
Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning [22.310861786709538]
We propose a scalable algorithm for cooperative multi-agent reinforcement learning. We show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity.
arXiv Detail & Related papers (2021-09-23T23:38:15Z)
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria. We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.