Federated Offline Policy Optimization with Dual Regularization
- URL: http://arxiv.org/abs/2405.17474v2
- Date: Wed, 29 May 2024 01:38:59 GMT
- Title: Federated Offline Policy Optimization with Dual Regularization
- Authors: Sheng Yue, Zerui Qin, Xingyuan Hua, Yongheng Deng, Ju Ren,
- Abstract summary: Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things.
Existing FRL approaches often entail repeated interactions with the environment during local updating, which can be prohibitively expensive or even infeasible in many real-world domains.
This paper proposes a novel offline federated policy optimization algorithm, named $textttO$, which enables distributed agents to collaboratively learn a decision policy only from private and static data.
- Score: 12.320355780707168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the environment during local updating, which can be prohibitively expensive or even infeasible in many real-world domains. To overcome this challenge, this paper proposes a novel offline federated policy optimization algorithm, named $\texttt{DRPO}$, which enables distributed agents to collaboratively learn a decision policy only from private and static data without further environmental interactions. $\texttt{DRPO}$ leverages dual regularization, incorporating both the local behavioral policy and the global aggregated policy, to judiciously cope with the intrinsic two-tier distributional shifts in offline FRL. Theoretical analysis characterizes the impact of the dual regularization on performance, demonstrating that by achieving the right balance thereof, $\texttt{DRPO}$ can effectively counteract distributional shifts and ensure strict policy improvement in each federative learning round. Extensive experiments validate the significant performance gains of $\texttt{DRPO}$ over baseline methods.
Related papers
- A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective [29.977702744504466]
We introduce a novel Advantage-Aware Policy Optimization (A2PO) method to explicitly construct advantage-aware policy constraints for offline learning.
A2PO employs a conditional variational auto-encoder to disentangle the action distributions of intertwined behavior policies.
Experiments conducted on both the single-quality and mixed-quality datasets of the D4RL benchmark demonstrate that A2PO yields results superior to the counterparts.
arXiv Detail & Related papers (2024-03-12T02:43:41Z) - Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with
Expert Guidance [74.31779732754697]
We propose a novel plug-in approach named Guided Offline RL (GORL)
GORL employs a guiding network, along with only a few expert demonstrations, to adaptively determine the relative importance of the policy improvement and policy constraint for every sample.
Experiments on various environments suggest that GORL can be easily installed on most offline RL algorithms with statistically significant performance improvements.
arXiv Detail & Related papers (2023-09-04T08:59:04Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - Offline Reinforcement Learning with Adaptive Behavior Regularization [1.491109220586182]
offline reinforcement learning (RL) defines a sample-efficient learning paradigm, where a policy is learned from static and previously collected datasets.
We propose a novel approach, which we refer to as adaptive behavior regularization (ABR)
ABR enables the policy to adaptively adjust its optimization objective between cloning and improving over the policy used to generate the dataset.
arXiv Detail & Related papers (2022-11-15T15:59:11Z) - ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared
State Representation and Individual Policy Representation [31.9768280877473]
We propose Evolutionary Reinforcement Learning with Two-scale State Representation and Policy Representation (ERL-Re$2$)
All EA and RL policies share the same nonlinear state representation while maintaining individual linear policy representations.
Experiments on a range of continuous control tasks show that ERL-Re$2$ consistently outperforms advanced baselines and achieves the State Of The Art (SOTA)
arXiv Detail & Related papers (2022-10-26T10:34:48Z) - Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes [99.26864533035454]
We study offline reinforcement learning (RL) in partially observable Markov decision processes.
We propose the underlineProxy variable underlinePessimistic underlinePolicy underlineOptimization (textttP3O) algorithm.
textttP3O is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.
arXiv Detail & Related papers (2022-05-26T19:13:55Z) - COptiDICE: Offline Constrained Reinforcement Learning via Stationary
Distribution Correction Estimation [73.17078343706909]
offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.
We present an offline constrained RL algorithm that optimize the policy in the space of the stationary distribution.
Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction.
arXiv Detail & Related papers (2022-04-19T15:55:47Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - CRPO: A New Approach for Safe Reinforcement Learning with Convergence
Guarantee [61.176159046544946]
In safe reinforcement learning (SRL) problems, an agent explores the environment to maximize an expected total reward and avoids violation of certain constraints.
This is the first-time analysis of SRL algorithms with global optimal policies.
arXiv Detail & Related papers (2020-11-11T16:05:14Z) - Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation
for Reinforcement Learning [43.61029925616256]
offline policy evaluation in Reinforcement Learning (RL) is a critical step towards applying RL in real-life applications.
We address this problem by simultaneously evaluating all policies in a policy class $Pi$ -- uniform convergence in OPE.
Our results imply that the model-based planning achieves an optimal episode complexity of $widetildeO(H3/d_mepsilon2)$ in identifying an $epsilon$-optimal policy.
arXiv Detail & Related papers (2020-07-07T19:44:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.