Sample-Efficient Learning of POMDPs with Multiple Observations In
Hindsight
- URL: http://arxiv.org/abs/2307.02884v1
- Date: Thu, 6 Jul 2023 09:39:01 GMT
- Title: Sample-Efficient Learning of POMDPs with Multiple Observations In
Hindsight
- Authors: Jiacheng Guo, Minshuo Chen, Huan Wang, Caiming Xiong, Mengdi Wang, Yu
Bai
- Abstract summary: This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs)
Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called multiple observations in hindsight''
We show that sample-efficient learning is possible for two new subclasses of POMDPs: emphmulti-observation revealing POMDPs and emphdistinguishable POMDPs
- Score: 105.6882315781987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the sample-efficiency of learning in Partially Observable
Markov Decision Processes (POMDPs), a challenging problem in reinforcement
learning that is known to be exponentially hard in the worst-case. Motivated by
real-world settings such as loading in game playing, we propose an enhanced
feedback model called ``multiple observations in hindsight'', where after each
episode of interaction with the POMDP, the learner may collect multiple
additional observations emitted from the encountered latent states, but may not
observe the latent states themselves. We show that sample-efficient learning
under this feedback model is possible for two new subclasses of POMDPs:
\emph{multi-observation revealing POMDPs} and \emph{distinguishable POMDPs}.
Both subclasses generalize and substantially relax \emph{revealing POMDPs} -- a
widely studied subclass for which sample-efficient learning is possible under
standard trajectory feedback. Notably, distinguishable POMDPs only require the
emission distributions from different latent states to be \emph{different}
instead of \emph{linearly independent} as required in revealing POMDPs.
Related papers
- ProPML: Probability Partial Multi-label Learning [12.814910734614351]
Partial Multi-label Learning (PML) is a type of weakly supervised learning where each training instance corresponds to a set of candidate labels, among which only some are true.
In this paper, we introduce our, a novel probabilistic approach to this problem that extends the binary cross entropy to the PML setup.
arXiv Detail & Related papers (2024-03-12T12:40:23Z) - Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles [95.49699178874683]
We propose DiffDiv, an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)
We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.
We show that DPM-guided diversification is sufficient to remove dependence on shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z) - Posterior Sampling-based Online Learning for Episodic POMDPs [5.797837329787459]
We consider the online learning problem for episodic POMDPs with unknown transition and observation models.
We propose a Posterior Sampling-based reinforcement learning algorithm for POMDPs.
arXiv Detail & Related papers (2023-10-16T06:41:13Z) - Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts
in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs)
We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z) - Learning in POMDPs is Sample-Efficient with Hindsight Observability [36.66596305441365]
POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability.
In many realistic problems, more information is either revealed or can be computed during some point of the learning process.
We formulate a setting (setshort) as a POMDP where the latent states are revealed to the learner in hindsight and only during training.
arXiv Detail & Related papers (2023-01-31T18:54:36Z) - Optimistic MLE -- A Generic Model-based Algorithm for Partially
Observable Sequential Decision Making [48.87943416098096]
This paper introduces a simple efficient learning algorithms for general sequential decision making.
We prove that OMLE learns near-optimal policies of an enormously rich class of sequential decision making problems.
arXiv Detail & Related papers (2022-09-29T17:56:25Z) - When Is Partially Observable Reinforcement Learning Not Scary? [30.754810416907123]
We show that learning partially observable decision processes (POMDPs) requires an exponential number of samples in the worst case.
This is the first provably efficient result for learning from interactions in overcomplete POMDPs.
arXiv Detail & Related papers (2022-04-19T16:08:28Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.