Auto4D: Learning to Label 4D Objects from Sequential Point Clouds
- URL: http://arxiv.org/abs/2101.06586v2
- Date: Thu, 11 Mar 2021 19:27:19 GMT
- Title: Auto4D: Learning to Label 4D Objects from Sequential Point Clouds
- Authors: Bin Yang, Min Bai, Ming Liang, Wenyuan Zeng, Raquel Urtasun
- Abstract summary: We propose an automatic pipeline that generates accurate object trajectories in 3D space from LiDAR point clouds.
The key idea is to decompose the 4D object label into two parts: the object size in 3D that's fixed through time for rigid objects, and the motion path describing the evolution of the object's pose through time.
Given the cheap but noisy input, our model produces higher quality 4D labels by re-estimating the object size and smoothing the motion path.
- Score: 89.30951657004408
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the past few years we have seen great advances in object perception
(particularly in 4D space-time dimensions) thanks to deep learning methods.
However, they typically rely on large amounts of high-quality labels to achieve
good performance, which often require time-consuming and expensive work by
human annotators. To address this we propose an automatic annotation pipeline
that generates accurate object trajectories in 3D space (i.e., 4D labels) from
LiDAR point clouds. The key idea is to decompose the 4D object label into two
parts: the object size in 3D that's fixed through time for rigid objects, and
the motion path describing the evolution of the object's pose through time.
Instead of generating a series of labels in one shot, we adopt an iterative
refinement process where online generated object detections are tracked through
time as the initialization. Given the cheap but noisy input, our model produces
higher quality 4D labels by re-estimating the object size and smoothing the
motion path, where the improvement is achieved by exploiting aggregated
observations and motion cues over the entire trajectory. We validate the
proposed method on a large-scale driving dataset and show a 25% reduction of
human annotation efforts. We also showcase the benefits of our approach in the
annotator-in-the-loop setting.
Related papers
- DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos [21.93514516437402]
We present DreamScene4D, the first approach to generate 3D dynamic scenes of multiple objects from monocular videos via novel view synthesis.
Our key insight is a "decompose-recompose" approach that factorizes the video scene into the background and object tracks.
We show extensive results on challenging DAVIS, Kubric, and self-captured videos with quantitative comparisons and a user preference study.
arXiv Detail & Related papers (2024-05-03T17:55:34Z) - 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
This work introduces 4DGen, a novel framework for grounded 4D content creation.
We identify static 3D assets and monocular video sequences as key components in constructing the 4D content.
Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos)
arXiv Detail & Related papers (2023-12-28T18:53:39Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR
based 3D Object Detection [50.959453059206446]
This paper aims for high-performance offline LiDAR-based 3D object detection.
We first observe that experienced human annotators annotate objects from a track-centric perspective.
We propose a high-performance offline detector in a track-centric perspective instead of the conventional object-centric perspective.
arXiv Detail & Related papers (2023-04-24T17:59:05Z) - 4D Unsupervised Object Discovery [53.561750858325915]
We propose 4D unsupervised object discovery, jointly discovering objects from 4D data -- 3D point clouds and 2D RGB images with temporal information.
We present the first practical approach for this task by proposing a ClusterNet on 3D point clouds, which is jointly optimized with a 2D localization network.
arXiv Detail & Related papers (2022-10-10T16:05:53Z) - Receding Moving Object Segmentation in 3D LiDAR Data Using Sparse 4D
Convolutions [33.538055872850514]
We tackle the problem of distinguishing 3D LiDAR points that belong to currently moving objects, like walking pedestrians or driving cars, from points that are obtained from non-moving objects, like walls but also parked cars.
Our approach takes a sequence of observed LiDAR scans and turns them into a voxelized sparse 4D point cloud.
We apply computationally efficient sparse 4D convolutions to jointly extract spatial and temporal features and predict moving object confidence scores for all points in the sequence.
arXiv Detail & Related papers (2022-06-08T18:51:14Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Tracking from Patterns: Learning Corresponding Patterns in Point Clouds
for 3D Object Tracking [34.40019455462043]
We propose to learn 3D object correspondences from temporal point cloud data and infer the motion information from correspondence patterns.
Our method exceeds the existing 3D tracking methods on both the KITTI and larger scale Nuscenes dataset.
arXiv Detail & Related papers (2020-10-20T06:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.