Bayesian Imitation Learning for End-to-End Mobile Manipulation
- URL: http://arxiv.org/abs/2202.07600v1
- Date: Tue, 15 Feb 2022 17:38:30 GMT
- Title: Bayesian Imitation Learning for End-to-End Mobile Manipulation
- Authors: Yuqing Du and Daniel Ho and Alexander A. Alemi and Eric Jang and Mohi
Khansari
- Abstract summary: Augmenting policies with additional sensor inputs, such as RGB + depth cameras, is a straightforward approach to improving robot perception capabilities.
We show that using the Variational Information Bottleneck to regularize convolutional neural networks improves generalization to held-out domains.
We demonstrate that our method is able to help close the sim-to-real gap and successfully fuse RGB and depth modalities.
- Score: 80.47771322489422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we investigate and demonstrate benefits of a Bayesian approach
to imitation learning from multiple sensor inputs, as applied to the task of
opening office doors with a mobile manipulator. Augmenting policies with
additional sensor inputs, such as RGB + depth cameras, is a straightforward
approach to improving robot perception capabilities, especially for tasks that
may favor different sensors in different situations. As we scale multi-sensor
robotic learning to unstructured real-world settings (e.g. offices, homes) and
more complex robot behaviors, we also increase reliance on simulators for cost,
efficiency, and safety. Consequently, the sim-to-real gap across multiple
sensor modalities also increases, making simulated validation more difficult.
We show that using the Variational Information Bottleneck (Alemi et al., 2016)
to regularize convolutional neural networks improves generalization to held-out
domains and reduces the sim-to-real gap in a sensor-agnostic manner. As a side
effect, the learned embeddings also provide useful estimates of model
uncertainty for each sensor. We demonstrate that our method is able to help
close the sim-to-real gap and successfully fuse RGB and depth modalities based
on understanding of the situational uncertainty of each sensor. In a real-world
office environment, we achieve 96% task success, improving upon the baseline by
+16%.
Related papers
- Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - Multimodal Anomaly Detection based on Deep Auto-Encoder for Object Slip
Perception of Mobile Manipulation Robots [22.63980025871784]
The proposed framework integrates heterogeneous data streams collected from various robot sensors, including RGB and depth cameras, a microphone, and a force-torque sensor.
The integrated data is used to train a deep autoencoder to construct latent representations of the multisensory data that indicate the normal status.
Anomalies can then be identified by error scores measured by the difference between the trained encoder's latent values and the latent values of reconstructed input data.
arXiv Detail & Related papers (2024-03-06T09:15:53Z) - Cognitive TransFuser: Semantics-guided Transformer-based Sensor Fusion
for Improved Waypoint Prediction [38.971222477695214]
RGB-LIDAR-based multi-task feature fusion network, coined Cognitive TransFuser, augments and exceeds the baseline network by a significant margin for safer and more complete road navigation.
We validate the proposed network on the Town05 Short and Town05 Long Benchmark through extensive experiments, achieving up to 44.2 FPS real-time inference time.
arXiv Detail & Related papers (2023-08-04T03:59:10Z) - See What the Robot Can't See: Learning Cooperative Perception for Visual
Navigation [11.943412856714154]
We train the sensors to encode and communicate relevant viewpoint information to the mobile robot.
We overcome the challenge of enabling all the sensors to predict the direction along the shortest path to the target.
Our results show that by using communication between the sensors and the robot, we achieve up to 2.0x improvement in SPL.
arXiv Detail & Related papers (2022-08-01T11:37:01Z) - Towards Optimal Strategies for Training Self-Driving Perception Models
in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone.
Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator.
We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z) - Point Cloud Based Reinforcement Learning for Sim-to-Real and Partial
Observability in Visual Navigation [62.22058066456076]
Reinforcement Learning (RL) represents powerful tools to solve complex robotic tasks.
RL does not work directly in the real-world, which is known as the sim-to-real transfer problem.
We propose a method that learns on an observation space constructed by point clouds and environment randomization.
arXiv Detail & Related papers (2020-07-27T17:46:59Z) - Understanding Multi-Modal Perception Using Behavioral Cloning for
Peg-In-a-Hole Insertion Tasks [21.275342989110978]
In this paper, we investigate the merits of multiple sensor modalities when combined to learn a controller for real world assembly operation tasks.
We propose a multi-step-ahead loss function to improve the performance of the behavioral cloning method.
arXiv Detail & Related papers (2020-07-22T19:46:51Z) - Learning Camera Miscalibration Detection [83.38916296044394]
This paper focuses on a data-driven approach to learn the detection of miscalibration in vision sensors, specifically RGB cameras.
Our contributions include a proposed miscalibration metric for RGB cameras and a novel semi-synthetic dataset generation pipeline based on this metric.
By training a deep convolutional neural network, we demonstrate the effectiveness of our pipeline to identify whether a recalibration of the camera's intrinsic parameters is required or not.
arXiv Detail & Related papers (2020-05-24T10:32:49Z) - Deep Soft Procrustes for Markerless Volumetric Sensor Alignment [81.13055566952221]
In this work, we improve markerless data-driven correspondence estimation to achieve more robust multi-sensor spatial alignment.
We incorporate geometric constraints in an end-to-end manner into a typical segmentation based model and bridge the intermediate dense classification task with the targeted pose estimation one.
Our model is experimentally shown to achieve similar results with marker-based methods and outperform the markerless ones, while also being robust to the pose variations of the calibration structure.
arXiv Detail & Related papers (2020-03-23T10:51:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.