Invariant Policy Optimization: Towards Stronger Generalization in
  Reinforcement Learning
        - URL: http://arxiv.org/abs/2006.01096v3
 - Date: Mon, 9 Nov 2020 09:54:50 GMT
 - Title: Invariant Policy Optimization: Towards Stronger Generalization in
  Reinforcement Learning
 - Authors: Anoopkumar Sonar, Vincent Pacelli, and Anirudha Majumdar
 - Abstract summary: A fundamental challenge in reinforcement learning is to learn policies that generalize beyond the operating domains experienced during training.
We propose a novel learning algorithm, Invariant Policy Optimization (IPO), that implements this principle and learns an invariant policy during training.
 - Score: 5.476958867922322
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   A fundamental challenge in reinforcement learning is to learn policies that
generalize beyond the operating domains experienced during training. In this
paper, we approach this challenge through the following invariance principle:
an agent must find a representation such that there exists an action-predictor
built on top of this representation that is simultaneously optimal across all
training domains. Intuitively, the resulting invariant policy enhances
generalization by finding causes of successful actions. We propose a novel
learning algorithm, Invariant Policy Optimization (IPO), that implements this
principle and learns an invariant policy during training. We compare our
approach with standard policy gradient methods and demonstrate significant
improvements in generalization performance on unseen domains for linear
quadratic regulator and grid-world problems, and an example where a robot must
learn to open doors with varying physical properties.
 
       
      
        Related papers
        - Improving Controller Generalization with Dimensionless Markov Decision   Processes [6.047438841182958]
We propose a Model-Based approach to increase generalization where both world model and policy are trained in a dimensionless state-action space.
We demonstrate the applicability of our method on simulated actuated pendulum and cartpole systems, where policies trained on a single environment are robust to shifts in the distribution of the context.
arXiv  Detail & Related papers  (2025-04-14T09:08:53Z) - Rule-Guided Reinforcement Learning Policy Evaluation and Improvement [9.077163856137505]
LEGIBLE is a novel approach to improving deep reinforcement learning policies.
It starts by mining rules from a deep RL policy, constituting a partially symbolic representation.
In the second step, we generalize the mined rules using domain knowledge expressed as metamorphic relations.
The third step is evaluating generalized rules to determine which generalizations improve performance when enforced.
arXiv  Detail & Related papers  (2025-03-12T11:13:08Z) - Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward   Augmented Imitation [19.37193250533054]
We propose to utilize imitation learning to transfer the policy learned from the reward modification to the target domain.
Our approach, Domain Adaptation and Reward Augmented Imitation Learning (DARAIL), utilizes the reward modification for domain adaptation.
arXiv  Detail & Related papers  (2024-11-15T02:35:20Z) - Randomized Adversarial Style Perturbations for Domain Generalization [49.888364462991234]
We propose a novel domain generalization technique, referred to as Randomized Adversarial Style Perturbation (RASP)
The proposed algorithm perturbs the style of a feature in an adversarial direction towards a randomly selected class, and makes the model learn against being misled by the unexpected styles observed in unseen target domains.
We evaluate the proposed algorithm via extensive experiments on various benchmarks and show that our approach improves domain generalization performance, especially in large-scale benchmarks.
arXiv  Detail & Related papers  (2023-04-04T17:07:06Z) - Offline Reinforcement Learning with Closed-Form Policy Improvement
  Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning.
In this paper, we propose our closed-form policy improvement operators.
We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv  Detail & Related papers  (2022-11-29T06:29:26Z) - Examining Policy Entropy of Reinforcement Learning Agents for   Personalization Tasks [0.40964539027092917]
This effort is focused on examining the behavior of reinforcement learning systems in personalization environments.
We provide a wide range of numerical experiments as well as theoretical justification to show that these differences in entropy are due to the type of learning being employed.
arXiv  Detail & Related papers  (2022-11-21T21:42:50Z) - Reinforcement learning based adaptive metaheuristics [5.254093731341154]
We introduce a general-purpose framework for performing parameter adaptation in continuous-domain metaheuristics based on state-of-the-art reinforcement learning algorithms.
We demonstrate the applicability of this framework on two algorithms, namely Covariance Matrix Adaptation Evolution Strategies (CMA-ES) and Differential Evolution (DE)
arXiv  Detail & Related papers  (2022-06-24T12:01:49Z) - Fast Model-based Policy Search for Universal Policy Networks [45.44896435487879]
Adapting an agent's behaviour to new environments has been one of the primary focus areas of physics based reinforcement learning.
We propose a Gaussian Process-based prior learned in simulation, that captures the likely performance of a policy when transferred to a previously unseen environment.
We integrate this prior with a Bayesian optimisation-based policy search process to improve the efficiency of identifying the most appropriate policy from the universal policy network.
arXiv  Detail & Related papers  (2022-02-11T18:08:02Z) - Towards an Understanding of Default Policies in Multitask Policy
  Optimization [29.806071693039655]
Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms.
We take a first step towards filling this gap by formally linking the quality of the default policy to its effect on optimization.
We then derive a principled RPO algorithm for multitask learning with strong performance guarantees.
arXiv  Detail & Related papers  (2021-11-04T16:45:15Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
  Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv  Detail & Related papers  (2021-05-24T02:21:34Z) - State Augmented Constrained Reinforcement Learning: Overcoming the
  Limitations of Learning with Rewards [88.30521204048551]
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds.
We show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards.
This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods.
arXiv  Detail & Related papers  (2021-02-23T21:07:35Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy   Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv  Detail & Related papers  (2020-12-30T03:22:35Z) - Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents.
We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively.
We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv  Detail & Related papers  (2020-04-19T15:42:55Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.