How to Explore with Belief: State Entropy Maximization in POMDPs
- URL: http://arxiv.org/abs/2406.02295v1
- Date: Tue, 4 Jun 2024 13:16:34 GMT
- Title: How to Explore with Belief: State Entropy Maximization in POMDPs
- Authors: Riccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti,
- Abstract summary: We develop a memory and efficient *policy* method to address a first-order relaxation of the objective defined on ** states.
This paper aims to generalize state entropy to more realistic domains that meet the challenges of applications.
- Score: 40.82741665804367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works have studied *state entropy maximization* in reinforcement learning, in which the agent's objective is to learn a policy inducing high entropy over states visitation (Hazan et al., 2019). They typically assume full observability of the state of the system, so that the entropy of the observations is maximized. In practice, the agent may only get *partial* observations, e.g., a robot perceiving the state of a physical space through proximity sensors and cameras. A significant mismatch between the entropy over observations and true states of the system can arise in those settings. In this paper, we address the problem of entropy maximization over the *true states* with a decision policy conditioned on partial observations *only*. The latter is a generalization of POMDPs, which is intractable in general. We develop a memory and computationally efficient *policy gradient* method to address a first-order relaxation of the objective defined on *belief* states, providing various formal characterizations of approximation gaps, the optimization landscape, and the *hallucination* problem. This paper aims to generalize state entropy maximization to more realistic domains that meet the challenges of applications.
Related papers
- The Limits of Pure Exploration in POMDPs: When the Observation Entropy is Enough [40.82741665804367]
We study a simple approach of maximizing the entropy over observations in place true latent states.
We show how knowledge of the latter can be exploited to compute a regularization of the observation entropy to improve principled performance.
arXiv Detail & Related papers (2024-06-18T17:00:13Z) - Predictable Reinforcement Learning Dynamics through Entropy Rate
Minimization [17.845518684835913]
In Reinforcement Learning (RL), agents have no incentive to exhibit predictable behaviors.
We propose a novel method to induce predictable behavior in RL agents, referred to as Predictability-Aware RL (PA-RL)
We show how the entropy rate can be formulated as an average reward objective, and since its entropy reward function is policy-dependent, we introduce an action-dependent surrogate entropy.
arXiv Detail & Related papers (2023-11-30T16:53:32Z) - Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration [97.19464604735802]
A promising technique for exploration is to maximize the entropy of visited state distribution.
It tends to struggle in a supervised setup with a task reward, where an agent prefers to visit high-value states.
We present a novel exploration technique that maximizes the value-conditional state entropy.
arXiv Detail & Related papers (2023-05-31T01:09:28Z) - Thermodynamically ideal quantum-state inputs to any device [1.4747234049753448]
We demonstrate that the expectation values of entropy flow, heat, and work can all be determined via Hermitian observables of the initial state.
We show how to construct these Hermitian operators from measurements of thermodynamic output from a finite number of effectively arbitrary inputs.
arXiv Detail & Related papers (2023-05-01T01:13:23Z) - Observational entropic study of Anderson localization [0.0]
We study the behaviour of the observational entropy in the context of localization-delocalization transition for one-dimensional Aubrey-Andr'e model.
For a given coarse-graining, it increases logarithmically with system size in the delocalized phase, and obeys area law in the localized phase.
We also find the increase of the observational entropy followed by the quantum quench, is logarithmic in time in the delocalized phase as well as at the transition point, while in the localized phase it oscillates.
arXiv Detail & Related papers (2022-09-21T11:26:43Z) - IRL with Partial Observations using the Principle of Uncertain Maximum
Entropy [8.296684637620553]
We introduce the principle of uncertain maximum entropy and present an expectation-maximization based solution.
We experimentally demonstrate the improved robustness to noisy data offered by our technique in a maximum causal entropy inverse reinforcement learning domain.
arXiv Detail & Related papers (2022-08-15T03:22:46Z) - Computationally Efficient PAC RL in POMDPs with Latent Determinism and
Conditional Embeddings [97.12538243736705]
We study reinforcement learning with function approximation for large-scale Partially Observable Decision Processes (POMDPs)
Our algorithm provably scales to large-scale POMDPs.
arXiv Detail & Related papers (2022-06-24T05:13:35Z) - Maximum entropy quantum state distributions [58.720142291102135]
We go beyond traditional thermodynamics and condition on the full distribution of the conserved quantities.
The result are quantum state distributions whose deviations from thermal states' get more pronounced in the limit of wide input distributions.
arXiv Detail & Related papers (2022-03-23T17:42:34Z) - Action Redundancy in Reinforcement Learning [54.291331971813364]
We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy.
Our results suggest that action redundancy is a fundamental problem in reinforcement learning.
arXiv Detail & Related papers (2021-02-22T19:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.