Flow-based Recurrent Belief State Learning for POMDPs
- URL: http://arxiv.org/abs/2205.11051v1
- Date: Mon, 23 May 2022 05:29:55 GMT
- Title: Flow-based Recurrent Belief State Learning for POMDPs
- Authors: Xiaoyu Chen, Yao Mu, Ping Luo, Shengbo Li, Jianyu Chen
- Abstract summary: Partially Observable Markov Decision Process (POMDP) provides a principled and generic framework to model real world sequential decision making processes.
The main challenge lies in how to accurately obtain the belief state, which is the probability distribution over the unobservable environment states.
Recent advances in deep learning techniques show great potential to learn good belief states.
- Score: 20.860726518161204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Partially Observable Markov Decision Process (POMDP) provides a principled
and generic framework to model real world sequential decision making processes
but yet remains unsolved, especially for high dimensional continuous space and
unknown models. The main challenge lies in how to accurately obtain the belief
state, which is the probability distribution over the unobservable environment
states given historical information. Accurately calculating this belief state
is a precondition for obtaining an optimal policy of POMDPs. Recent advances in
deep learning techniques show great potential to learn good belief states.
However, existing methods can only learn approximated distribution with limited
flexibility. In this paper, we introduce the \textbf{F}l\textbf{O}w-based
\textbf{R}ecurrent \textbf{BE}lief \textbf{S}tate model (FORBES), which
incorporates normalizing flows into the variational inference to learn general
continuous belief states for POMDPs. Furthermore, we show that the learned
belief states can be plugged into downstream RL algorithms to improve
performance. In experiments, we show that our methods successfully capture the
complex belief states that enable multi-modal predictions as well as high
quality reconstructions, and results on challenging visual-motor control tasks
show that our method achieves superior performance and sample efficiency.
Related papers
- FNP: Fourier Neural Processes for Arbitrary-Resolution Data Assimilation [58.149902193341816]
We propose textittextbfFourier Neural Processes (FNP) for textitarbitrary-resolution data assimilation in this paper.
Our FNP trained on a fixed resolution can directly handle the assimilation of observations with out-of-distribution resolutions and the observational information reconstruction task without additional fine-tuning.
arXiv Detail & Related papers (2024-06-03T12:24:24Z) - Streamflow Prediction with Uncertainty Quantification for Water Management: A Constrained Reasoning and Learning Approach [27.984958596544278]
This paper studies a constrained reasoning and learning (CRL) approach where physical laws represented as logical constraints are integrated as a layer in the deep neural network.
To address small data setting, we develop a theoretically-grounded training approach to improve the generalization accuracy of deep models.
arXiv Detail & Related papers (2024-05-31T18:53:53Z) - Probabilistic Inference in Reinforcement Learning Done Right [37.31057328219418]
A popular perspective in Reinforcement learning casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP)
Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statistical inference.
We first reveal that this quantity can indeed be used to generate a policy that explores efficiently, as measured by regret.
arXiv Detail & Related papers (2023-11-22T10:23:14Z) - Learning non-Markovian Decision-Making from State-only Sequences [57.20193609153983]
We develop a model-based imitation of state-only sequences with non-Markov Decision Process (nMDP)
We demonstrate the efficacy of the proposed method in a path planning task with non-Markovian constraints.
arXiv Detail & Related papers (2023-06-27T02:26:01Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Knowing the Past to Predict the Future: Reinforcement Virtual Learning [29.47688292868217]
Reinforcement Learning (RL)-based control system has received considerable attention in recent decades.
In this paper, we present a cost-efficient framework, such that the RL model can evolve for itself in a Virtual Space.
The proposed framework enables a step-by-step RL model to predict the future state and select optimal actions for long-sight decisions.
arXiv Detail & Related papers (2022-11-02T16:48:14Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Data Augmentation through Expert-guided Symmetry Detection to Improve
Performance in Offline Reinforcement Learning [0.0]
offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task.
Recent works showed that an expert-guided pipeline relying on Density Estimation methods effectively detects this structure in deterministic environments.
We show that the former results lead to a performance improvement when solving the learned MDP and then applying the optimized policy in the real environment.
arXiv Detail & Related papers (2021-12-18T14:32:32Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via
Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments.
It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics.
We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.