Related papers: Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

URL: http://arxiv.org/abs/2006.01096v3
Date: Mon, 9 Nov 2020 09:54:50 GMT
Title: Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning
Authors: Anoopkumar Sonar, Vincent Pacelli, and Anirudha Majumdar
Abstract summary: A fundamental challenge in reinforcement learning is to learn policies that generalize beyond the operating domains experienced during training. We propose a novel learning algorithm, Invariant Policy Optimization (IPO), that implements this principle and learns an invariant policy during training.
Score: 5.476958867922322
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A fundamental challenge in reinforcement learning is to learn policies that generalize beyond the operating domains experienced during training. In this paper, we approach this challenge through the following invariance principle: an agent must find a representation such that there exists an action-predictor built on top of this representation that is simultaneously optimal across all training domains. Intuitively, the resulting invariant policy enhances generalization by finding causes of successful actions. We propose a novel learning algorithm, Invariant Policy Optimization (IPO), that implements this principle and learns an invariant policy during training. We compare our approach with standard policy gradient methods and demonstrate significant improvements in generalization performance on unseen domains for linear quadratic regulator and grid-world problems, and an example where a robot must learn to open doors with varying physical properties.

Related papers

Improving Controller Generalization with Dimensionless Markov Decision Processes [6.047438841182958]
We propose a Model-Based approach to increase generalization where both world model and policy are trained in a dimensionless state-action space. We demonstrate the applicability of our method on simulated actuated pendulum and cartpole systems, where policies trained on a single environment are robust to shifts in the distribution of the context.
arXiv Detail & Related papers (2025-04-14T09:08:53Z)
Rule-Guided Reinforcement Learning Policy Evaluation and Improvement [9.077163856137505]
LEGIBLE is a novel approach to improving deep reinforcement learning policies. It starts by mining rules from a deep RL policy, constituting a partially symbolic representation. In the second step, we generalize the mined rules using domain knowledge expressed as metamorphic relations. The third step is evaluating generalized rules to determine which generalizations improve performance when enforced.
arXiv Detail & Related papers (2025-03-12T11:13:08Z)
Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation [19.37193250533054]
We propose to utilize imitation learning to transfer the policy learned from the reward modification to the target domain. Our approach, Domain Adaptation and Reward Augmented Imitation Learning (DARAIL), utilizes the reward modification for domain adaptation.
arXiv Detail & Related papers (2024-11-15T02:35:20Z)
Randomized Adversarial Style Perturbations for Domain Generalization [49.888364462991234]
We propose a novel domain generalization technique, referred to as Randomized Adversarial Style Perturbation (RASP) The proposed algorithm perturbs the style of a feature in an adversarial direction towards a randomly selected class, and makes the model learn against being misled by the unexpected styles observed in unseen target domains. We evaluate the proposed algorithm via extensive experiments on various benchmarks and show that our approach improves domain generalization performance, especially in large-scale benchmarks.
arXiv Detail & Related papers (2023-04-04T17:07:06Z)
Offline Reinforcement Learning with Closed-Form Policy Improvement Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. In this paper, we propose our closed-form policy improvement operators. We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z)
Examining Policy Entropy of Reinforcement Learning Agents for Personalization Tasks [0.40964539027092917]
This effort is focused on examining the behavior of reinforcement learning systems in personalization environments. We provide a wide range of numerical experiments as well as theoretical justification to show that these differences in entropy are due to the type of learning being employed.
arXiv Detail & Related papers (2022-11-21T21:42:50Z)
Reinforcement learning based adaptive metaheuristics [5.254093731341154]
We introduce a general-purpose framework for performing parameter adaptation in continuous-domain metaheuristics based on state-of-the-art reinforcement learning algorithms. We demonstrate the applicability of this framework on two algorithms, namely Covariance Matrix Adaptation Evolution Strategies (CMA-ES) and Differential Evolution (DE)
arXiv Detail & Related papers (2022-06-24T12:01:49Z)
Fast Model-based Policy Search for Universal Policy Networks [45.44896435487879]
Adapting an agent's behaviour to new environments has been one of the primary focus areas of physics based reinforcement learning. We propose a Gaussian Process-based prior learned in simulation, that captures the likely performance of a policy when transferred to a previously unseen environment. We integrate this prior with a Bayesian optimisation-based policy search process to improve the efficiency of identifying the most appropriate policy from the universal policy network.
arXiv Detail & Related papers (2022-02-11T18:08:02Z)
Towards an Understanding of Default Policies in Multitask Policy Optimization [29.806071693039655]
Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms. We take a first step towards filling this gap by formally linking the quality of the default policy to its effect on optimization. We then derive a principled RPO algorithm for multitask learning with strong performance guarantees.
arXiv Detail & Related papers (2021-11-04T16:45:15Z)
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL. We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z)
State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards [88.30521204048551]
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds. We show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards. This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods.
arXiv Detail & Related papers (2021-02-23T21:07:35Z)
Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions. We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions. We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z)
Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents. We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively. We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.