Related papers: Auto4D: Learning to Label 4D Objects from Sequential Point Clouds

Auto4D: Learning to Label 4D Objects from Sequential Point Clouds

URL: http://arxiv.org/abs/2101.06586v2
Date: Thu, 11 Mar 2021 19:27:19 GMT
Title: Auto4D: Learning to Label 4D Objects from Sequential Point Clouds
Authors: Bin Yang, Min Bai, Ming Liang, Wenyuan Zeng, Raquel Urtasun
Abstract summary: We propose an automatic pipeline that generates accurate object trajectories in 3D space from LiDAR point clouds. The key idea is to decompose the 4D object label into two parts: the object size in 3D that's fixed through time for rigid objects, and the motion path describing the evolution of the object's pose through time. Given the cheap but noisy input, our model produces higher quality 4D labels by re-estimating the object size and smoothing the motion path.
Score: 89.30951657004408
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In the past few years we have seen great advances in object perception (particularly in 4D space-time dimensions) thanks to deep learning methods. However, they typically rely on large amounts of high-quality labels to achieve good performance, which often require time-consuming and expensive work by human annotators. To address this we propose an automatic annotation pipeline that generates accurate object trajectories in 3D space (i.e., 4D labels) from LiDAR point clouds. The key idea is to decompose the 4D object label into two parts: the object size in 3D that's fixed through time for rigid objects, and the motion path describing the evolution of the object's pose through time. Instead of generating a series of labels in one shot, we adopt an iterative refinement process where online generated object detections are tracked through time as the initialization. Given the cheap but noisy input, our model produces higher quality 4D labels by re-estimating the object size and smoothing the motion path, where the improvement is achieved by exploiting aggregated observations and motion cues over the entire trajectory. We validate the proposed method on a large-scale driving dataset and show a 25% reduction of human annotation efforts. We also showcase the benefits of our approach in the annotator-in-the-loop setting.

Related papers

GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision [7.511342491529451]
We study the hard problem of 3D object segmentation in complex point clouds without requiring human labels of 3D scenes for supervision. By relying on the similarity of pretrained 2D features or external signals such as motion to group 3D points as objects, existing unsupervised methods are usually limited to identifying simple objects like cars or their segmented objects are often inferior due to the lack of objectness in pretrained features.
arXiv Detail & Related papers (2025-04-16T04:13:53Z)
Street Gaussians without 3D Object Tracker [86.62329193275916]
Existing methods rely on labor-intensive manual labeling of object poses to reconstruct dynamic objects in canonical space. We propose a stable object tracking module by leveraging associations from 2D deep trackers within a 3D object fusion strategy. We address inevitable tracking errors by further introducing a motion learning strategy in an implicit feature space that autonomously corrects trajectory errors and recovers missed detections.
arXiv Detail & Related papers (2024-12-07T05:49:42Z)
DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos [21.93514516437402]
We present DreamScene4D, the first approach to generate 3D dynamic scenes of multiple objects from monocular videos via novel view synthesis. Our key insight is a "decompose-recompose" approach that factorizes the video scene into the background and object tracks. We show extensive results on challenging DAVIS, Kubric, and self-captured videos with quantitative comparisons and a user preference study.
arXiv Detail & Related papers (2024-05-03T17:55:34Z)
4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
This work introduces 4DGen, a novel framework for grounded 4D content creation. We identify static 3D assets and monocular video sequences as key components in constructing the 4D content. Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos)
arXiv Detail & Related papers (2023-12-28T18:53:39Z)
Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels. Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions. Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations. Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z)
Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR based 3D Object Detection [50.959453059206446]
This paper aims for high-performance offline LiDAR-based 3D object detection. We first observe that experienced human annotators annotate objects from a track-centric perspective. We propose a high-performance offline detector in a track-centric perspective instead of the conventional object-centric perspective.
arXiv Detail & Related papers (2023-04-24T17:59:05Z)
4D Unsupervised Object Discovery [53.561750858325915]
We propose 4D unsupervised object discovery, jointly discovering objects from 4D data -- 3D point clouds and 2D RGB images with temporal information. We present the first practical approach for this task by proposing a ClusterNet on 3D point clouds, which is jointly optimized with a 2D localization network.
arXiv Detail & Related papers (2022-10-10T16:05:53Z)
Receding Moving Object Segmentation in 3D LiDAR Data Using Sparse 4D Convolutions [33.538055872850514]
We tackle the problem of distinguishing 3D LiDAR points that belong to currently moving objects, like walking pedestrians or driving cars, from points that are obtained from non-moving objects, like walls but also parked cars. Our approach takes a sequence of observed LiDAR scans and turns them into a voxelized sparse 4D point cloud. We apply computationally efficient sparse 4D convolutions to jointly extract spatial and temporal features and predict moving object confidence scores for all points in the sequence.
arXiv Detail & Related papers (2022-06-08T18:51:14Z)
Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)
Tracking from Patterns: Learning Corresponding Patterns in Point Clouds for 3D Object Tracking [34.40019455462043]
We propose to learn 3D object correspondences from temporal point cloud data and infer the motion information from correspondence patterns. Our method exceeds the existing 3D tracking methods on both the KITTI and larger scale Nuscenes dataset.
arXiv Detail & Related papers (2020-10-20T06:07:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.