Federated Reinforcement Learning with Environment Heterogeneity
- URL: http://arxiv.org/abs/2204.02634v1
- Date: Wed, 6 Apr 2022 07:21:00 GMT
- Title: Federated Reinforcement Learning with Environment Heterogeneity
- Authors: Hao Jin, Yang Peng, Wenhao Yang, Shusen Wang, Zhihua Zhang
- Abstract summary: We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction.
We propose two federated RL algorithms, textttQAvg and textttPAvg.
- Score: 30.797692838836277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a Federated Reinforcement Learning (FedRL) problem in which $n$
agents collaboratively learn a single policy without sharing the trajectories
they collected during agent-environment interaction. We stress the constraint
of environment heterogeneity, which means $n$ environments corresponding to
these $n$ agents have different state transitions. To obtain a value function
or a policy function which optimizes the overall performance in all
environments, we propose two federated RL algorithms, \texttt{QAvg} and
\texttt{PAvg}. We theoretically prove that these algorithms converge to
suboptimal solutions, while such suboptimality depends on how heterogeneous
these $n$ environments are. Moreover, we propose a heuristic that achieves
personalization by embedding the $n$ environments into $n$ vectors. The
personalization heuristic not only improves the training but also allows for
better generalization to new environments.
Related papers
- Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts [0.15889427269227555]
We develop an adaptive re-training algorithm inspired by evolutionary game theory (EGT)
ERPO shows faster policy adaptation, higher average rewards, and reduced computational costs in policy adaptation.
arXiv Detail & Related papers (2024-10-22T09:29:53Z) - Uncertainty-Aware Reward-Free Exploration with General Function Approximation [69.27868448449755]
In this paper, we propose a reward-free reinforcement learning algorithm called alg.
The key idea behind our algorithm is an uncertainty-aware intrinsic reward for exploring the environment.
Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms.
arXiv Detail & Related papers (2024-06-24T01:37:18Z) - Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments [17.995517050546244]
We explore a Federated Reinforcement Learning (FRL) problem where $N$ agents collaboratively learn a common policy without sharing their trajectory data.
We propose two algorithms: FedSVRPG-M and FedHAPG-M, which converge to a stationary point of the average performance function.
Our algorithms enjoy linear convergence speedups with respect to the number of agents, highlighting the benefit of collaboration among agents in finding a common policy.
arXiv Detail & Related papers (2024-05-29T20:24:42Z) - Federated Reinforcement Learning with Constraint Heterogeneity [22.79217297480751]
We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity.
We show that FedNPG achieves global convergence with an $tildeO (1/sqrtT)$ rate, and FedPPO efficiently solves complicated learning tasks with the use of deep neural networks.
arXiv Detail & Related papers (2024-05-06T07:44:50Z) - Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis [41.75366066380951]
We propose a novel asynchronous federated reinforcement learning framework termed AFedPG, which constructs a global model through collaboration among $N$ agents.
We analyze the theoretical global convergence bound of AFedPG, and characterize the advantage of the proposed algorithm in terms of both the sample complexity and time complexity.
We empirically verify the improved performance of AFedPG in four widely-used MuJoCo environments with varying numbers of agents.
arXiv Detail & Related papers (2024-04-09T04:21:13Z) - Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with
Expert Guidance [74.31779732754697]
We propose a novel plug-in approach named Guided Offline RL (GORL)
GORL employs a guiding network, along with only a few expert demonstrations, to adaptively determine the relative importance of the policy improvement and policy constraint for every sample.
Experiments on various environments suggest that GORL can be easily installed on most offline RL algorithms with statistically significant performance improvements.
arXiv Detail & Related papers (2023-09-04T08:59:04Z) - Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning
with Linear Function Approximation [16.871660060209674]
We study the problem of deployment efficient reinforcement learning (RL) with linear function approximation under the emphreward-free exploration setting.
We propose a new algorithm that collects at most $widetildeO(fracd2H5epsilon2)$ trajectories within $H$ deployments to identify $epsilon$-optimal policy for any (possibly data-dependent) choice of reward functions.
arXiv Detail & Related papers (2022-10-03T03:48:26Z) - A Relational Intervention Approach for Unsupervised Dynamics
Generalization in Model-Based Reinforcement Learning [113.75991721607174]
We introduce an interventional prediction module to estimate the probability of two estimated $hatz_i, hatz_j$ belonging to the same environment.
We empirically show that $hatZ$ estimated by our method enjoy less redundant information than previous methods.
arXiv Detail & Related papers (2022-06-09T15:01:36Z) - Settling the Horizon-Dependence of Sample Complexity in Reinforcement
Learning [82.31436758872715]
We develop an algorithm that achieves the same PAC guarantee while using only $O(1)$ episodes of environment interactions.
We establish a connection between value functions in discounted and finite-horizon Markov decision processes.
arXiv Detail & Related papers (2021-11-01T00:21:24Z) - Iterative Feature Matching: Toward Provable Domain Generalization with
Logarithmic Environments [55.24895403089543]
Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.
We present a new algorithm based on performing iterative feature matching that is guaranteed with high probability to yield a predictor that generalizes after seeing only $O(logd_s)$ environments.
arXiv Detail & Related papers (2021-06-18T04:39:19Z) - Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED)
Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.