Federated Reinforcement Learning with Environment Heterogeneity
- URL: http://arxiv.org/abs/2204.02634v1
- Date: Wed, 6 Apr 2022 07:21:00 GMT
- Title: Federated Reinforcement Learning with Environment Heterogeneity
- Authors: Hao Jin, Yang Peng, Wenhao Yang, Shusen Wang, Zhihua Zhang
- Abstract summary: We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction.
We propose two federated RL algorithms, textttQAvg and textttPAvg.
- Score: 30.797692838836277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a Federated Reinforcement Learning (FedRL) problem in which $n$
agents collaboratively learn a single policy without sharing the trajectories
they collected during agent-environment interaction. We stress the constraint
of environment heterogeneity, which means $n$ environments corresponding to
these $n$ agents have different state transitions. To obtain a value function
or a policy function which optimizes the overall performance in all
environments, we propose two federated RL algorithms, \texttt{QAvg} and
\texttt{PAvg}. We theoretically prove that these algorithms converge to
suboptimal solutions, while such suboptimality depends on how heterogeneous
these $n$ environments are. Moreover, we propose a heuristic that achieves
personalization by embedding the $n$ environments into $n$ vectors. The
personalization heuristic not only improves the training but also allows for
better generalization to new environments.
Related papers
- Uncertainty-Aware Reward-Free Exploration with General Function Approximation [69.27868448449755]
In this paper, we propose a reward-free reinforcement learning algorithm called alg.
The key idea behind our algorithm is an uncertainty-aware intrinsic reward for exploring the environment.
Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms.
arXiv Detail & Related papers (2024-06-24T01:37:18Z) - Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments [17.995517050546244]
We explore a Federated Reinforcement Learning (FRL) problem where $N$ agents collaboratively learn a common policy without sharing their trajectory data.
We propose two algorithms: FedSVRPG-M and FedHAPG-M, which converge to a stationary point of the average performance function.
Our algorithms enjoy linear convergence speedups with respect to the number of agents, highlighting the benefit of collaboration among agents in finding a common policy.
arXiv Detail & Related papers (2024-05-29T20:24:42Z) - Federated Reinforcement Learning with Constraint Heterogeneity [22.79217297480751]
We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity.
We show that FedNPG achieves global convergence with an $tildeO (1/sqrtT)$ rate, and FedPPO efficiently solves complicated learning tasks with the use of deep neural networks.
arXiv Detail & Related papers (2024-05-06T07:44:50Z) - Invariant-Feature Subspace Recovery: A New Class of Provable Domain
Generalization Algorithms [14.248005245508432]
Domain generalization asks for trained models over a set of training environments to generalize well in unseen test environments.
We propose Subspace Recovery (ISR): a new class of algorithms to achieve provable regression problems.
ISR can be used as post-processing methods for neural nets such as neural nets Empirically, we demonstrate the superior performance of our ISRs on synthetic benchmarks.
arXiv Detail & Related papers (2023-11-02T03:24:55Z) - Federated Natural Policy Gradient Methods for Multi-task Reinforcement
Learning [49.65958529941962]
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories.
In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks.
We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
arXiv Detail & Related papers (2023-11-01T00:15:18Z) - Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning
with Linear Function Approximation [16.871660060209674]
We study the problem of deployment efficient reinforcement learning (RL) with linear function approximation under the emphreward-free exploration setting.
We propose a new algorithm that collects at most $widetildeO(fracd2H5epsilon2)$ trajectories within $H$ deployments to identify $epsilon$-optimal policy for any (possibly data-dependent) choice of reward functions.
arXiv Detail & Related papers (2022-10-03T03:48:26Z) - DEFT: Diverse Ensembles for Fast Transfer in Reinforcement Learning [1.111018778205595]
We present Diverse Ensembles for Fast Transfer in RL (DEFT), a new ensemble-based method for reinforcement learning in highly multimodal environments.
The algorithm is broken down into two main phases: training of ensemble members, and synthesis (or fine-tuning) of the ensemble members into a policy that works in a new environment.
arXiv Detail & Related papers (2022-09-26T04:35:57Z) - A Relational Intervention Approach for Unsupervised Dynamics
Generalization in Model-Based Reinforcement Learning [113.75991721607174]
We introduce an interventional prediction module to estimate the probability of two estimated $hatz_i, hatz_j$ belonging to the same environment.
We empirically show that $hatZ$ estimated by our method enjoy less redundant information than previous methods.
arXiv Detail & Related papers (2022-06-09T15:01:36Z) - Settling the Horizon-Dependence of Sample Complexity in Reinforcement
Learning [82.31436758872715]
We develop an algorithm that achieves the same PAC guarantee while using only $O(1)$ episodes of environment interactions.
We establish a connection between value functions in discounted and finite-horizon Markov decision processes.
arXiv Detail & Related papers (2021-11-01T00:21:24Z) - Iterative Feature Matching: Toward Provable Domain Generalization with
Logarithmic Environments [55.24895403089543]
Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.
We present a new algorithm based on performing iterative feature matching that is guaranteed with high probability to yield a predictor that generalizes after seeing only $O(logd_s)$ environments.
arXiv Detail & Related papers (2021-06-18T04:39:19Z) - Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED)
Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.