A Hierarchical Bayesian model for Inverse RL in Partially-Controlled
Environments
- URL: http://arxiv.org/abs/2107.05818v1
- Date: Tue, 13 Jul 2021 02:38:14 GMT
- Title: A Hierarchical Bayesian model for Inverse RL in Partially-Controlled
Environments
- Authors: Kenneth Bogert (University of North Carolina Asheville) and Prashant
Doshi (University of Georgia)
- Abstract summary: We present a hierarchical Bayesian model that incorporates both the expert's and the confounding elements' observations.
In particular, our technique outperforms several other comparative methods, second only to having perfect knowledge of the subject's trajectory.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robots learning from observations in the real world using inverse
reinforcement learning (IRL) may encounter objects or agents in the
environment, other than the expert, that cause nuisance observations during the
demonstration. These confounding elements are typically removed in
fully-controlled environments such as virtual simulations or lab settings. When
complete removal is impossible the nuisance observations must be filtered out.
However, identifying the source of observations when large amounts of
observations are made is difficult. To address this, we present a hierarchical
Bayesian model that incorporates both the expert's and the confounding
elements' observations thereby explicitly modeling the diverse observations a
robot may receive. We extend an existing IRL algorithm originally designed to
work under partial occlusion of the expert to consider the diverse
observations. In a simulated robotic sorting domain containing both occlusion
and confounding elements, we demonstrate the model's effectiveness. In
particular, our technique outperforms several other comparative methods, second
only to having perfect knowledge of the subject's trajectory.
Related papers
- Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets [87.62730694973696]
This paper introduces CRAFT, a sample-efficient algorithm leveraging differences in controllable feature dynamics across agents to learn representations.
We provide theoretical guarantees for CRAFT's performance and demonstrate its feasibility on a toy example.
arXiv Detail & Related papers (2025-03-26T22:05:57Z) - A Dual Approach to Imitation Learning from Observations with Offline Datasets [19.856363985916644]
Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult.
We derive DILO, an algorithm that can leverage arbitrary suboptimal data to learn imitating policies without requiring expert actions.
arXiv Detail & Related papers (2024-06-13T04:39:42Z) - Object-centric architectures enable efficient causal representation
learning [51.6196391784561]
We show that when the observations are of multiple objects, the generative function is no longer injective and disentanglement fails in practice.
We develop an object-centric architecture that leverages weak supervision from sparse perturbations to disentangle each object's properties.
This approach is more data-efficient in the sense that it requires significantly fewer perturbations than a comparable approach that encodes to a Euclidean space.
arXiv Detail & Related papers (2023-10-29T16:01:03Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - Imitation from Observation With Bootstrapped Contrastive Learning [12.048166025000976]
Imitation from observation (IfO) is a learning paradigm that consists of training autonomous agents in a Markov Decision Process.
We present BootIfOL, an IfO algorithm that aims to learn a reward function that takes an agent trajectory and compares it to an expert.
We evaluate our approach on a variety of control tasks showing that we can train effective policies using a limited number of demonstrative trajectories.
arXiv Detail & Related papers (2023-02-13T17:32:17Z) - Provably Sample-Efficient RL with Side Information about Latent Dynamics [12.461789905893026]
We study reinforcement learning in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space.
We present an algorithm, called TASID, that learns a robust policy in the target domain, with sample complexity that is in the horizon.
arXiv Detail & Related papers (2022-05-27T21:07:03Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - Marginal MAP Estimation for Inverse RL under Occlusion with Observer
Noise [9.670578317106182]
We consider the problem of learning the behavioral preferences of an expert engaged in a task from noisy and partially-observable demonstrations.
Previous techniques for inverse reinforcement learning (IRL) take the approach of either omitting the missing portions or inferring it as part of expectation-maximization.
We present a new method that generalizes the well-known Bayesian maximum-a-posteriori (MAP) IRL method by marginalizing the occluded portions of the trajectory.
arXiv Detail & Related papers (2021-09-16T08:20:52Z) - Latent World Models For Intrinsically Motivated Exploration [140.21871701134626]
We present a self-supervised representation learning method for image-based observations.
We consider episodic and life-long uncertainties to guide the exploration of partially observable environments.
arXiv Detail & Related papers (2020-10-05T19:47:04Z) - Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR.
Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking.
Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z) - SPACE: Unsupervised Object-Oriented Scene Representation via Spatial
Attention and Decomposition [26.42139271058149]
We propose a generative latent variable model, called SPACE, that combines the best of spatial-attention and scene-mixture approaches.
We show through experiments on Atari and 3D-Rooms that SPACE achieves the above properties consistently in comparison to SPAIR, IODINE, and GENESIS.
arXiv Detail & Related papers (2020-01-08T07:44:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.