The Wasserstein Believer: Learning Belief Updates for Partially
Observable Environments through Reliable Latent Space Models
- URL: http://arxiv.org/abs/2303.03284v3
- Date: Thu, 26 Oct 2023 15:25:57 GMT
- Title: The Wasserstein Believer: Learning Belief Updates for Partially
Observable Environments through Reliable Latent Space Models
- Authors: Raphael Avalos, Florent Delgrange, Ann Now\'e, Guillermo A. P\'erez,
Diederik M. Roijers
- Abstract summary: We propose an RL algorithm that learns a latent model of the POMDP and an approximation of the belief update.
Our approach comes with theoretical guarantees on the quality of our approximation ensuring that our outputted beliefs allow for learning the optimal value function.
- Score: 3.462371782084948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Partially Observable Markov Decision Processes (POMDPs) are used to model
environments where the full state cannot be perceived by an agent. As such the
agent needs to reason taking into account the past observations and actions.
However, simply remembering the full history is generally intractable due to
the exponential growth in the history space. Maintaining a probability
distribution that models the belief over what the true state is can be used as
a sufficient statistic of the history, but its computation requires access to
the model of the environment and is often intractable. While SOTA algorithms
use Recurrent Neural Networks to compress the observation-action history aiming
to learn a sufficient statistic, they lack guarantees of success and can lead
to sub-optimal policies. To overcome this, we propose the Wasserstein Belief
Updater, an RL algorithm that learns a latent model of the POMDP and an
approximation of the belief update. Our approach comes with theoretical
guarantees on the quality of our approximation ensuring that our outputted
beliefs allow for learning the optimal value function.
Related papers
- A Probabilistic Perspective on Model Collapse [9.087950471621653]
This paper aims to characterize the conditions under which model collapse occurs and, crucially, how it can be mitigated.<n>Under mild conditions, we rigorously show that progressively increasing the sample size at each training step is necessary to prevent model collapse.<n>We also investigate the probability that training on synthetic data yields models that outperform those trained solely on real data.
arXiv Detail & Related papers (2025-05-20T05:25:29Z) - Continuous Visual Autoregressive Generation via Score Maximization [69.67438563485887]
We introduce a Continuous VAR framework that enables direct visual autoregressive generation without vector quantization.<n>Within this framework, all we need is to select a strictly proper score and set it as the training objective to optimize.
arXiv Detail & Related papers (2025-05-12T17:58:14Z) - A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework in Large Language Models (LLMs)
We derive novel metrics with high-probability guarantees concerning the output distribution of a model.
Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - Periodic agent-state based Q-learning for POMDPs [23.296159073116264]
A widely used alternative is to use an agent state, which is a model-free,periodicly updateable function of the observation history.
We propose PA ( agent-state based Q-learning), which is a variant of agent-state-based Q-learning that learns periodic policies.
By combining ideas from periodic Markov chains and approximation, we rigorously establish that PA converges to a cyclic limit and characterize the approximation error of the periodic policy.
arXiv Detail & Related papers (2024-07-08T16:58:57Z) - DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning [14.952800864366512]
conservatism should be incorporated into the algorithm to balance accurate offline data and imprecise model data.
This paper proposes a milDly cOnservative Model-bAsed offlINe RL algorithm (DOMAIN) without estimating model uncertainty.
The results of extensive experiments show that DOMAIN outperforms prior RL algorithms on the D4RL dataset benchmark.
arXiv Detail & Related papers (2023-09-16T08:39:28Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Knowing the Past to Predict the Future: Reinforcement Virtual Learning [29.47688292868217]
Reinforcement Learning (RL)-based control system has received considerable attention in recent decades.
In this paper, we present a cost-efficient framework, such that the RL model can evolve for itself in a Virtual Space.
The proposed framework enables a step-by-step RL model to predict the future state and select optimal actions for long-sight decisions.
arXiv Detail & Related papers (2022-11-02T16:48:14Z) - Flow-based Recurrent Belief State Learning for POMDPs [20.860726518161204]
Partially Observable Markov Decision Process (POMDP) provides a principled and generic framework to model real world sequential decision making processes.
The main challenge lies in how to accurately obtain the belief state, which is the probability distribution over the unobservable environment states.
Recent advances in deep learning techniques show great potential to learn good belief states.
arXiv Detail & Related papers (2022-05-23T05:29:55Z) - Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware
Regression [91.3373131262391]
Uncertainty is the only certainty there is.
Traditionally, the direct regression formulation is considered and the uncertainty is modeled by modifying the output space to a certain family of probabilistic distributions.
How to model the uncertainty within the present-day technologies for regression remains an open issue.
arXiv Detail & Related papers (2021-03-25T06:56:09Z) - Learning Interpretable Deep State Space Model for Probabilistic Time
Series Forecasting [98.57851612518758]
Probabilistic time series forecasting involves estimating the distribution of future based on its history.
We propose a deep state space model for probabilistic time series forecasting whereby the non-linear emission model and transition model are parameterized by networks.
We show in experiments that our model produces accurate and sharp probabilistic forecasts.
arXiv Detail & Related papers (2021-01-31T06:49:33Z) - Provably Good Batch Reinforcement Learning Without Great Exploration [51.51462608429621]
Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks.
Recent algorithms have shown promise but can still be overly optimistic in their expected outcomes.
We show that a small modification to Bellman optimality and evaluation back-up to take a more conservative update can have much stronger guarantees.
arXiv Detail & Related papers (2020-07-16T09:25:54Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.