Federated Reinforcement Learning with Constraint Heterogeneity
- URL: http://arxiv.org/abs/2405.03236v1
- Date: Mon, 6 May 2024 07:44:50 GMT
- Title: Federated Reinforcement Learning with Constraint Heterogeneity
- Authors: Hao Jin, Liangyu Zhang, Zhihua Zhang,
- Abstract summary: We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity.
We show that FedNPG achieves global convergence with an $tildeO (1/sqrtT)$ rate, and FedPPO efficiently solves complicated learning tasks with the use of deep neural networks.
- Score: 22.79217297480751
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity. In our setting, we aim to solve a reinforcement learning problem with multiple constraints while $N$ training agents are located in $N$ different environments with limited access to the constraint signals and they are expected to collaboratively learn a policy satisfying all constraint signals. Such learning problems are prevalent in scenarios of Large Language Model (LLM) fine-tuning and healthcare applications. To solve the problem, we propose federated primal-dual policy optimization methods based on traditional policy gradient methods. Specifically, we introduce $N$ local Lagrange functions for agents to perform local policy updates, and these agents are then scheduled to periodically communicate on their local policies. Taking natural policy gradient (NPG) and proximal policy optimization (PPO) as policy optimization methods, we mainly focus on two instances of our algorithms, ie, {FedNPG} and {FedPPO}. We show that FedNPG achieves global convergence with an $\tilde{O}(1/\sqrt{T})$ rate, and FedPPO efficiently solves complicated learning tasks with the use of deep neural networks.
Related papers
- Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning [46.28771270378047]
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories.
In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment.
We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
arXiv Detail & Related papers (2023-11-01T00:15:18Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Global Convergence of Localized Policy Iteration in Networked
Multi-Agent Reinforcement Learning [25.747559058350557]
We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network.
The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards.
To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) that provably learns a near-globally-optimal policy using only local information.
arXiv Detail & Related papers (2022-11-30T15:58:00Z) - Offline Reinforcement Learning with Closed-Form Policy Improvement
Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning.
In this paper, we propose our closed-form policy improvement operators.
We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z) - Optimistic Linear Support and Successor Features as a Basis for Optimal
Policy Transfer [7.970144204429356]
We introduce an SF-based extension of the Optimistic Linear Support algorithm to learn a set of policies whose SFs form a convex coverage set.
We prove that policies in this set can be combined via generalized policy improvement to construct optimal behaviors for any new linearly-expressible tasks.
arXiv Detail & Related papers (2022-06-22T19:00:08Z) - Federated Reinforcement Learning with Environment Heterogeneity [30.797692838836277]
We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction.
We propose two federated RL algorithms, textttQAvg and textttPAvg.
arXiv Detail & Related papers (2022-04-06T07:21:00Z) - CRPO: A New Approach for Safe Reinforcement Learning with Convergence
Guarantee [61.176159046544946]
In safe reinforcement learning (SRL) problems, an agent explores the environment to maximize an expected total reward and avoids violation of certain constraints.
This is the first-time analysis of SRL algorithms with global optimal policies.
arXiv Detail & Related papers (2020-11-11T16:05:14Z) - Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation
for Reinforcement Learning [43.61029925616256]
offline policy evaluation in Reinforcement Learning (RL) is a critical step towards applying RL in real-life applications.
We address this problem by simultaneously evaluating all policies in a policy class $Pi$ -- uniform convergence in OPE.
Our results imply that the model-based planning achieves an optimal episode complexity of $widetildeO(H3/d_mepsilon2)$ in identifying an $epsilon$-optimal policy.
arXiv Detail & Related papers (2020-07-07T19:44:14Z) - Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states.
We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization.
Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.