SEAL: Self-supervised Embodied Active Learning using Exploration and 3D
Consistency
- URL: http://arxiv.org/abs/2112.01001v1
- Date: Thu, 2 Dec 2021 06:26:38 GMT
- Title: SEAL: Self-supervised Embodied Active Learning using Exploration and 3D
Consistency
- Authors: Devendra Singh Chaplot, Murtaza Dalal, Saurabh Gupta, Jitendra Malik,
Ruslan Salakhutdinov
- Abstract summary: We present a framework called Self- Embodied Embodied Active Learning (SEAL)
It utilizes perception models trained on internet images to learn an active exploration policy.
We and build utilize 3D semantic maps to learn both action and perception in a completely self-supervised manner.
- Score: 122.18108118190334
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we explore how we can build upon the data and models of
Internet images and use them to adapt to robot vision without requiring any
extra labels. We present a framework called Self-supervised Embodied Active
Learning (SEAL). It utilizes perception models trained on internet images to
learn an active exploration policy. The observations gathered by this
exploration policy are labelled using 3D consistency and used to improve the
perception model. We build and utilize 3D semantic maps to learn both action
and perception in a completely self-supervised manner. The semantic map is used
to compute an intrinsic motivation reward for training the exploration policy
and for labelling the agent observations using spatio-temporal 3D consistency
and label propagation. We demonstrate that the SEAL framework can be used to
close the action-perception loop: it improves object detection and instance
segmentation performance of a pretrained perception model by just moving around
in training environments and the improved perception model can be used to
improve Object Goal Navigation.
Related papers
- ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos [4.736059095502584]
This work proposes a novel approach using Cross-Architecture Pseudo-Labeling with contrastive learning for semi-supervised action recognition.
We introduce a novel cross-architecture approach where 3D Convolutional Neural Networks (3D CNNs) and video transformers (VIT) are utilised to capture different aspects of action representations.
arXiv Detail & Related papers (2024-04-09T12:09:56Z) - Motion Degeneracy in Self-supervised Learning of Elevation Angle
Estimation for 2D Forward-Looking Sonar [4.683630397028384]
This study aims to realize stable self-supervised learning of elevation angle estimation without pretraining using synthetic images.
We first analyze the motion field of 2D forward-looking sonar, which is related to the main supervision signal.
arXiv Detail & Related papers (2023-07-30T08:06:11Z) - An Empirical Study of Pseudo-Labeling for Image-based 3D Object
Detection [72.30883544352918]
We investigate whether pseudo-labels can provide effective supervision for the baseline models under varying settings.
We achieve 20.23 AP for moderate level on the KITTI-3D testing set without bells and whistles, improving the baseline model by 6.03 AP.
We hope this work can provide insights for the image-based 3D detection community under a semi-supervised setting.
arXiv Detail & Related papers (2022-08-15T12:17:46Z) - Paint and Distill: Boosting 3D Object Detection with Semantic Passing
Network [70.53093934205057]
3D object detection task from lidar or camera sensors is essential for autonomous driving.
We propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models.
arXiv Detail & Related papers (2022-07-12T12:35:34Z) - 3D Object Detection with a Self-supervised Lidar Scene Flow Backbone [10.341296683155973]
We propose using a self-supervised training strategy to learn a general point cloud backbone model for downstream 3D vision tasks.
Our main contribution leverages learned flow and motion representations and combines a self-supervised backbone with a 3D detection head.
Experiments on KITTI and nuScenes benchmarks show that the proposed self-supervised pre-training increases 3D detection performance significantly.
arXiv Detail & Related papers (2022-05-02T07:53:29Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Hindsight for Foresight: Unsupervised Structured Dynamics Models from
Physical Interaction [24.72947291987545]
Key challenge for an agent learning to interact with the world is to reason about physical properties of objects.
We propose a novel approach for modeling the dynamics of a robot's interactions directly from unlabeled 3D point clouds and images.
arXiv Detail & Related papers (2020-08-02T11:04:49Z) - Improving Target-driven Visual Navigation with Attention on 3D Spatial
Relationships [52.72020203771489]
We investigate target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes.
Our proposed method combines visual features and 3D spatial representations to learn navigation policy.
Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics.
arXiv Detail & Related papers (2020-04-29T08:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.