Marginal MAP Estimation for Inverse RL under Occlusion with Observer
Noise
- URL: http://arxiv.org/abs/2109.07788v1
- Date: Thu, 16 Sep 2021 08:20:52 GMT
- Title: Marginal MAP Estimation for Inverse RL under Occlusion with Observer
Noise
- Authors: Prasanth Sengadu Suresh, Prashant Doshi
- Abstract summary: We consider the problem of learning the behavioral preferences of an expert engaged in a task from noisy and partially-observable demonstrations.
Previous techniques for inverse reinforcement learning (IRL) take the approach of either omitting the missing portions or inferring it as part of expectation-maximization.
We present a new method that generalizes the well-known Bayesian maximum-a-posteriori (MAP) IRL method by marginalizing the occluded portions of the trajectory.
- Score: 9.670578317106182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of learning the behavioral preferences of an expert
engaged in a task from noisy and partially-observable demonstrations. This is
motivated by real-world applications such as a line robot learning from
observing a human worker, where some observations are occluded by environmental
objects that cannot be removed. Furthermore, robotic perception tends to be
imperfect and noisy. Previous techniques for inverse reinforcement learning
(IRL) take the approach of either omitting the missing portions or inferring it
as part of expectation-maximization, which tends to be slow and prone to local
optima. We present a new method that generalizes the well-known Bayesian
maximum-a-posteriori (MAP) IRL method by marginalizing the occluded portions of
the trajectory. This is additionally extended with an observation model to
account for perception noise. We show that the marginal MAP (MMAP) approach
significantly improves on the previous IRL technique under occlusion in both
formative evaluations on a toy problem and in a summative evaluation on an
onion sorting line task by a robot.
Related papers
- Noise-Free Explanation for Driving Action Prediction [11.330363757618379]
We propose an easy-to-implement but effective way to remedy this flaw: Smooth Noise Norm Attention (SNNA)
We weigh the attention by the norm of the transformed value vector and guide the label-specific signal with the attention gradient, then randomly sample the input perturbations and average the corresponding gradients to produce noise-free attribution.
Both qualitative and quantitative evaluation results show the superiority of SNNA compared to other SOTA attention-based explainable methods in generating a clearer visual explanation map and ranking the input pixel importance.
arXiv Detail & Related papers (2024-07-08T19:21:24Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Direct Unsupervised Denoising [60.71146161035649]
Unsupervised denoisers do not directly produce a single prediction, such as the MMSE estimate.
We present an alternative approach that trains a deterministic network alongside the VAE to directly predict a central tendency.
arXiv Detail & Related papers (2023-10-27T13:02:12Z) - Unsupervised Discovery of Interpretable Directions in h-space of
Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself.
By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z) - On the Theoretical Properties of Noise Correlation in Stochastic
Optimization [6.970991851511823]
We show that fPGD possesses exploration abilities favorable over PGD and Anti-PGD.
These results open the field to novel ways to exploit noise for machine learning models.
arXiv Detail & Related papers (2022-09-19T16:32:22Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - A Hierarchical Bayesian model for Inverse RL in Partially-Controlled
Environments [0.0]
We present a hierarchical Bayesian model that incorporates both the expert's and the confounding elements' observations.
In particular, our technique outperforms several other comparative methods, second only to having perfect knowledge of the subject's trajectory.
arXiv Detail & Related papers (2021-07-13T02:38:14Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Latent World Models For Intrinsically Motivated Exploration [140.21871701134626]
We present a self-supervised representation learning method for image-based observations.
We consider episodic and life-long uncertainties to guide the exploration of partially observable environments.
arXiv Detail & Related papers (2020-10-05T19:47:04Z) - Augmented Behavioral Cloning from Observation [14.45796459531414]
Imitation from observation is a technique that teaches an agent on how to mimic the behavior of an expert by observing only the sequence of states from the expert demonstrations.
We show empirically that our approach outperforms the state-of-the-art approaches in four different environments by a large margin.
arXiv Detail & Related papers (2020-04-28T13:56:36Z) - An Adversarial Objective for Scalable Exploration [39.482557864395005]
Model-based curiosity combines active learning approaches to optimal sampling with the information gain based incentives for exploration.
Existing model-based curiosity methods look to approximate prediction uncertainty with approaches which struggle to scale to many prediction-planning pipelines.
We address these scalability issues with an adversarial curiosity method minimizing a score given by a discriminator network.
arXiv Detail & Related papers (2020-03-13T02:03:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.