IRL with Partial Observations using the Principle of Uncertain Maximum
Entropy
- URL: http://arxiv.org/abs/2208.06988v1
- Date: Mon, 15 Aug 2022 03:22:46 GMT
- Title: IRL with Partial Observations using the Principle of Uncertain Maximum
Entropy
- Authors: Kenneth Bogert, Yikang Gui, and Prashant Doshi
- Abstract summary: We introduce the principle of uncertain maximum entropy and present an expectation-maximization based solution.
We experimentally demonstrate the improved robustness to noisy data offered by our technique in a maximum causal entropy inverse reinforcement learning domain.
- Score: 8.296684637620553
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The principle of maximum entropy is a broadly applicable technique for
computing a distribution with the least amount of information possible while
constrained to match empirically estimated feature expectations. However, in
many real-world applications that use noisy sensors computing the feature
expectations may be challenging due to partial observation of the relevant
model variables. For example, a robot performing apprenticeship learning may
lose sight of the agent it is learning from due to environmental occlusion. We
show that in generalizing the principle of maximum entropy to these types of
scenarios we unavoidably introduce a dependency on the learned model to the
empirical feature expectations. We introduce the principle of uncertain maximum
entropy and present an expectation-maximization based solution generalized from
the principle of latent maximum entropy. Finally, we experimentally demonstrate
the improved robustness to noisy data offered by our technique in a maximum
causal entropy inverse reinforcement learning domain.
Related papers
- The Limits of Pure Exploration in POMDPs: When the Observation Entropy is Enough [40.82741665804367]
We study a simple approach of maximizing the entropy over observations in place true latent states.
We show how knowledge of the latter can be exploited to compute a regularization of the observation entropy to improve principled performance.
arXiv Detail & Related papers (2024-06-18T17:00:13Z) - How to Explore with Belief: State Entropy Maximization in POMDPs [40.82741665804367]
We develop a memory and efficient *policy* method to address a first-order relaxation of the objective defined on ** states.
This paper aims to generalize state entropy to more realistic domains that meet the challenges of applications.
arXiv Detail & Related papers (2024-06-04T13:16:34Z) - The Principle of Uncertain Maximum Entropy [0.0]
We present a new principle we call uncertain maximum entropy that generalizes the classic principle and provides interpretable solutions.
We introduce a convex approximation and expectation-maximization based algorithm for finding solutions to our new principle.
arXiv Detail & Related papers (2023-05-17T00:45:41Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - Principled Knowledge Extrapolation with GANs [92.62635018136476]
We study counterfactual synthesis from a new perspective of knowledge extrapolation.
We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem.
Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
arXiv Detail & Related papers (2022-05-21T08:39:42Z) - Generalisation and the Risk--Entropy Curve [0.49723239539321284]
We show that the expected generalisation performance of a learning machine is determined by the distribution of risks or equivalently its entropy.
Results are presented for different deep neural network models using Markov Chain Monte Carlo techniques.
arXiv Detail & Related papers (2022-02-15T12:19:10Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - Notes on Generalizing the Maximum Entropy Principle to Uncertain Data [0.0]
We generalize the principle of maximum entropy for computing a distribution with the least amount of information possible.
We show that our technique generalizes the principle of maximum entropy and latent maximum entropy.
We discuss a generally applicable regularization technique for adding error terms to feature expectation constraints in the event of limited data.
arXiv Detail & Related papers (2021-09-09T19:43:28Z) - Loss Bounds for Approximate Influence-Based Abstraction [81.13024471616417]
Influence-based abstraction aims to gain leverage by modeling local subproblems together with the 'influence' that the rest of the system exerts on them.
This paper investigates the performance of such approaches from a theoretical perspective.
We show that neural networks trained with cross entropy are well suited to learn approximate influence representations.
arXiv Detail & Related papers (2020-11-03T15:33:10Z) - A maximum-entropy approach to off-policy evaluation in average-reward
MDPs [54.967872716145656]
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs)
We provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases.
We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.
arXiv Detail & Related papers (2020-06-17T18:13:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.