Reinforcement Learning using Guided Observability
- URL: http://arxiv.org/abs/2104.10986v1
- Date: Thu, 22 Apr 2021 10:47:35 GMT
- Title: Reinforcement Learning using Guided Observability
- Authors: Stephan Weigand, Pascal Klink, Jan Peters, Joni Pajarinen
- Abstract summary: We propose a simple but efficient approach to make reinforcement learning cope with partial observability.
Our main insight is that smoothly transitioning from full observability to partial observability during the training process yields a high performance policy.
A comprehensive evaluation in discrete partially observableMarkov decision process (POMDP) benchmark problems and continuous partially observable MuJoCo and OpenAI gym tasks shows that PO-GRL improves performance.
- Score: 26.307025803058714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to recent breakthroughs, reinforcement learning (RL) has demonstrated
impressive performance in challenging sequential decision-making problems.
However, an open question is how to make RL cope with partial observability
which is prevalent in many real-world problems. Contrary to contemporary RL
approaches, which focus mostly on improved memory representations or strong
assumptions about the type of partial observability, we propose a simple but
efficient approach that can be applied together with a wide variety of RL
methods. Our main insight is that smoothly transitioning from full
observability to partial observability during the training process yields a
high performance policy. The approach, called partially observable guided
reinforcement learning (PO-GRL), allows to utilize full state information
during policy optimization without compromising the optimality of the final
policy. A comprehensive evaluation in discrete partially observableMarkov
decision process (POMDP) benchmark problems and continuous partially observable
MuJoCo and OpenAI gym tasks shows that PO-GRL improves performance. Finally, we
demonstrate PO-GRL in the ball-in-the-cup task on a real Barrett WAM robot
under partial observability.
Related papers
- Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks [5.968716050740402]
This paper focuses on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks.
We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent.
Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity.
arXiv Detail & Related papers (2024-02-14T10:44:03Z) - Blending Imitation and Reinforcement Learning for Robust Policy
Improvement [16.588397203235296]
Imitation learning (IL) utilizes oracles to improve sample efficiency.
RPI draws on the strengths of IL, using oracle queries to facilitate exploration.
RPI is capable of learning from and improving upon a diverse set of black-box oracles.
arXiv Detail & Related papers (2023-10-03T01:55:54Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Hypernetworks for Zero-shot Transfer in Reinforcement Learning [21.994654567458017]
Hypernetworks are trained to generate behaviors across a range of unseen task conditions.
This work relates to meta RL, contextual RL, and transfer learning.
Our method demonstrates significant improvements over baselines from multitask and meta RL approaches.
arXiv Detail & Related papers (2022-11-28T15:48:35Z) - Light-weight probing of unsupervised representations for Reinforcement Learning [20.638410483549706]
We study whether linear probing can be a proxy evaluation task for the quality of unsupervised RL representation.
We show that the probing tasks are strongly rank correlated with the downstream RL performance on the Atari100k Benchmark.
This provides a more efficient method for exploring the space of pretraining algorithms and identifying promising pretraining recipes.
arXiv Detail & Related papers (2022-08-25T21:08:01Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL.
We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit
Partial Observability [92.95794652625496]
Generalization is a central challenge for the deployment of reinforcement learning systems.
We show that generalization to unseen test conditions from a limited number of training conditions induces implicit partial observability.
We recast the problem of generalization in RL as solving the induced partially observed Markov decision process.
arXiv Detail & Related papers (2021-07-13T17:59:25Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.