Self-Supervised Pillar Motion Learning for Autonomous Driving
- URL: http://arxiv.org/abs/2104.08683v1
- Date: Sun, 18 Apr 2021 02:32:08 GMT
- Title: Self-Supervised Pillar Motion Learning for Autonomous Driving
- Authors: Chenxu Luo, Xiaodong Yang, Alan Yuille
- Abstract summary: We propose a learning framework that leverages free supervisory signals from point clouds and paired camera images to estimate motion purely via self-supervision.
Our model involves a point cloud based structural consistency augmented with probabilistic motion masking as well as a cross-sensor motion regularization to realize the desired self-supervision.
- Score: 10.921208239968827
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Autonomous driving can benefit from motion behavior comprehension when
interacting with diverse traffic participants in highly dynamic environments.
Recently, there has been a growing interest in estimating class-agnostic motion
directly from point clouds. Current motion estimation methods usually require
vast amount of annotated training data from self-driving scenes. However,
manually labeling point clouds is notoriously difficult, error-prone and
time-consuming. In this paper, we seek to answer the research question of
whether the abundant unlabeled data collections can be utilized for accurate
and efficient motion learning. To this end, we propose a learning framework
that leverages free supervisory signals from point clouds and paired camera
images to estimate motion purely via self-supervision. Our model involves a
point cloud based structural consistency augmented with probabilistic motion
masking as well as a cross-sensor motion regularization to realize the desired
self-supervision. Experiments reveal that our approach performs competitively
to supervised methods, and achieves the state-of-the-art result when combining
our self-supervised model with supervised fine-tuning.
Related papers
- DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.
Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.
Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations [53.797896854533384]
Class-agnostic motion prediction methods directly predict the motion of the entire point cloud.
While most existing methods rely on fully-supervised learning, the manual labeling of point cloud data is laborious and time-consuming.
We introduce three simple spatial and temporal regularization losses, which facilitate the self-supervised training process effectively.
arXiv Detail & Related papers (2024-03-20T02:58:45Z) - Self-Supervised Bird's Eye View Motion Prediction with Cross-Modality
Signals [38.20643428486824]
Learning the dense bird's eye view (BEV) motion flow in a self-supervised manner is an emerging research for robotics and autonomous driving.
Current self-supervised methods mainly rely on point correspondences between point clouds.
We introduce a novel cross-modality self-supervised training framework that effectively addresses these issues by leveraging multi-modality data.
arXiv Detail & Related papers (2024-01-21T14:09:49Z) - Self-Supervised Multi-Object Tracking For Autonomous Driving From
Consistency Across Timescales [53.55369862746357]
Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data.
However, their re-identification accuracy still falls short compared to their supervised counterparts.
We propose a training objective that enables self-supervised learning of re-identification features from multiple sequential frames.
arXiv Detail & Related papers (2023-04-25T20:47:29Z) - Motion Inspired Unsupervised Perception and Prediction in Autonomous
Driving [29.731790562352344]
This paper pioneers a novel and challenging direction, i.e., training perception and prediction models to understand open-set moving objects.
Our proposed framework uses self-learned flow to trigger an automated meta labeling pipeline to achieve automatic supervision.
We show that our approach generates highly promising results in open-set 3D detection and trajectory prediction.
arXiv Detail & Related papers (2022-10-14T18:55:44Z) - An Adaptable Approach to Learn Realistic Legged Locomotion without
Examples [38.81854337592694]
This work proposes a generic approach for ensuring realism in locomotion by guiding the learning process with the spring-loaded inverted pendulum model as a reference.
We present experimental results showing that even in a model-free setup, the learned policies can generate realistic and energy-efficient locomotion gaits for a bipedal and a quadrupedal robot.
arXiv Detail & Related papers (2021-10-28T10:14:47Z) - IntentNet: Learning to Predict Intention from Raw Sensor Data [86.74403297781039]
In this paper, we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment.
Our multi-task model achieves better accuracy than the respective separate modules while saving computation, which is critical to reducing reaction time in self-driving applications.
arXiv Detail & Related papers (2021-01-20T00:31:52Z) - Implicit Latent Variable Model for Scene-Consistent Motion Forecasting [78.74510891099395]
In this paper, we aim to learn scene-consistent motion forecasts of complex urban traffic directly from sensor data.
We model the scene as an interaction graph and employ powerful graph neural networks to learn a distributed latent representation of the scene.
arXiv Detail & Related papers (2020-07-23T14:31:25Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.