Leveraging Fully Observable Policies for Learning under Partial
Observability
- URL: http://arxiv.org/abs/2211.01991v1
- Date: Thu, 3 Nov 2022 16:57:45 GMT
- Title: Leveraging Fully Observable Policies for Learning under Partial
Observability
- Authors: Hai Nguyen, Andrea Baisero, Dian Wang, Christopher Amato, Robert Platt
- Abstract summary: We propose a method for partially observable reinforcement learning that uses a fully observable policy during offline training to improve online performance.
Our approach can leverage the fully-observable policy for exploration and parts of the domain that are fully observable while still being able to learn under partial observability.
A successful policy transfer to a physical robot in a manipulation task from pixels shows our approach's practicality in learning interesting policies under partial observability.
- Score: 14.918197552051929
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning in partially observable domains is challenging due to
the lack of observable state information. Thankfully, learning offline in a
simulator with such state information is often possible. In particular, we
propose a method for partially observable reinforcement learning that uses a
fully observable policy (which we call a state expert) during offline training
to improve online performance. Based on Soft Actor-Critic (SAC), our agent
balances performing actions similar to the state expert and getting high
returns under partial observability. Our approach can leverage the
fully-observable policy for exploration and parts of the domain that are fully
observable while still being able to learn under partial observability. On six
robotics domains, our method outperforms pure imitation, pure reinforcement
learning, the sequential or parallel combination of both types, and a recent
state-of-the-art method in the same setting. A successful policy transfer to a
physical robot in a manipulation task from pixels shows our approach's
practicality in learning interesting policies under partial observability.
Related papers
- Learning Interpretable Policies in Hindsight-Observable POMDPs through
Partially Supervised Reinforcement Learning [57.67629402360924]
We introduce the Partially Supervised Reinforcement Learning (PSRL) framework.
At the heart of PSRL is the fusion of both supervised and unsupervised learning.
We show that PSRL offers a potent balance, enhancing model interpretability while preserving, and often significantly outperforming, the performance benchmarks set by traditional methods.
arXiv Detail & Related papers (2024-02-14T16:23:23Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Implicit Offline Reinforcement Learning via Supervised Learning [83.8241505499762]
Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels.
We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets.
arXiv Detail & Related papers (2022-10-21T21:59:42Z) - Provably Efficient Reinforcement Learning in Partially Observable
Dynamical Systems [97.12538243736705]
We study Reinforcement Learning for partially observable dynamical systems using function approximation.
We propose a new textitPartially Observable Bilinear Actor-Critic framework, that is general enough to include models such as POMDPs, observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition.
arXiv Detail & Related papers (2022-06-24T00:27:42Z) - Exploiting Action Impact Regularity and Exogenous State Variables for
Offline Reinforcement Learning [30.337391523928396]
We explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning.
We discuss algorithms that exploit the Action Impact Regularity (AIR) property, and provide a theoretical analysis for an algorithm based on Fitted-Q Iteration.
We demonstrate that the algorithm outperforms existing offline reinforcement learning algorithms across different data collection policies in simulated and real world environments.
arXiv Detail & Related papers (2021-11-15T20:14:18Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.