Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit
Partial Observability
- URL: http://arxiv.org/abs/2107.06277v1
- Date: Tue, 13 Jul 2021 17:59:25 GMT
- Title: Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit
Partial Observability
- Authors: Dibya Ghosh, Jad Rahme, Aviral Kumar, Amy Zhang, Ryan P. Adams, Sergey
Levine
- Abstract summary: Generalization is a central challenge for the deployment of reinforcement learning systems.
We show that generalization to unseen test conditions from a limited number of training conditions induces implicit partial observability.
We recast the problem of generalization in RL as solving the induced partially observed Markov decision process.
- Score: 92.95794652625496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization is a central challenge for the deployment of reinforcement
learning (RL) systems in the real world. In this paper, we show that the
sequential structure of the RL problem necessitates new approaches to
generalization beyond the well-studied techniques used in supervised learning.
While supervised learning methods can generalize effectively without explicitly
accounting for epistemic uncertainty, we show that, perhaps surprisingly, this
is not the case in RL. We show that generalization to unseen test conditions
from a limited number of training conditions induces implicit partial
observability, effectively turning even fully-observed MDPs into POMDPs.
Informed by this observation, we recast the problem of generalization in RL as
solving the induced partially observed Markov decision process, which we call
the epistemic POMDP. We demonstrate the failure modes of algorithms that do not
appropriately handle this partial observability, and suggest a simple
ensemble-based technique for approximately solving the partially observed
problem. Empirically, we demonstrate that our simple algorithm derived from the
epistemic POMDP achieves significant gains in generalization over current
methods on the Procgen benchmark suite.
Related papers
- IMEX-Reg: Implicit-Explicit Regularization in the Function Space for Continual Learning [17.236861687708096]
Continual learning (CL) remains one of the long-standing challenges for deep neural networks due to catastrophic forgetting of previously acquired knowledge.
Inspired by how humans learn using strong inductive biases, we propose IMEX-Reg to improve the generalization performance of experience rehearsal in CL under low buffer regimes.
arXiv Detail & Related papers (2024-04-28T12:25:09Z) - Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning [74.67655210734338]
In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption.
We develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations.
We empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks.
arXiv Detail & Related papers (2023-11-20T23:56:58Z) - A Unified Approach to Controlling Implicit Regularization via Mirror
Descent [18.536453909759544]
Mirror descent (MD) is a notable generalization of gradient descent (GD)
We show that MD can be implemented efficiently and enjoys fast convergence under suitable conditions.
arXiv Detail & Related papers (2023-06-24T03:57:26Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Optimistic MLE -- A Generic Model-based Algorithm for Partially
Observable Sequential Decision Making [48.87943416098096]
This paper introduces a simple efficient learning algorithms for general sequential decision making.
We prove that OMLE learns near-optimal policies of an enormously rich class of sequential decision making problems.
arXiv Detail & Related papers (2022-09-29T17:56:25Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - Reinforcement Learning using Guided Observability [26.307025803058714]
We propose a simple but efficient approach to make reinforcement learning cope with partial observability.
Our main insight is that smoothly transitioning from full observability to partial observability during the training process yields a high performance policy.
A comprehensive evaluation in discrete partially observableMarkov decision process (POMDP) benchmark problems and continuous partially observable MuJoCo and OpenAI gym tasks shows that PO-GRL improves performance.
arXiv Detail & Related papers (2021-04-22T10:47:35Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.