Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data
- URL: http://arxiv.org/abs/2203.16258v1
- Date: Wed, 30 Mar 2022 12:40:30 GMT
- Title: Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data
- Authors: Corentin Sautier, Gilles Puy, Spyros Gidaris, Alexandre Boulch, Andrei
Bursuc, Renaud Marlet
- Abstract summary: We propose a self-supervised pre-training method for 3D perception models tailored to autonomous driving data.
We leverage the availability of synchronized and calibrated image and Lidar sensors in autonomous driving setups.
Our method does not require any point cloud nor image annotations.
- Score: 80.14669385741202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segmenting or detecting objects in sparse Lidar point clouds are two
important tasks in autonomous driving to allow a vehicle to act safely in its
3D environment. The best performing methods in 3D semantic segmentation or
object detection rely on a large amount of annotated data. Yet annotating 3D
Lidar data for these tasks is tedious and costly. In this context, we propose a
self-supervised pre-training method for 3D perception models that is tailored
to autonomous driving data. Specifically, we leverage the availability of
synchronized and calibrated image and Lidar sensors in autonomous driving
setups for distilling self-supervised pre-trained image representations into 3D
models. Hence, our method does not require any point cloud nor image
annotations. The key ingredient of our method is the use of superpixels which
are used to pool 3D point features and 2D pixel features in visually similar
regions. We then train a 3D network on the self-supervised task of matching
these pooled point features with the corresponding pooled image pixel features.
The advantages of contrasting regions obtained by superpixels are that: (1)
grouping together pixels and points of visually coherent regions leads to a
more meaningful contrastive task that produces features well adapted to 3D
semantic segmentation and 3D object detection; (2) all the different regions
have the same weight in the contrastive loss regardless of the number of 3D
points sampled in these regions; (3) it mitigates the noise produced by
incorrect matching of points and pixels due to occlusions between the different
sensors. Extensive experiments on autonomous driving datasets demonstrate the
ability of our image-to-Lidar distillation strategy to produce 3D
representations that transfer well on semantic segmentation and object
detection tasks.
Related papers
- Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes.
We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data.
We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection [19.75965521357068]
We propose a novel approach called SOGDet (Semantic-Occupancy Guided Multi-view 3D Object Detection) to improve the accuracy of 3D object detection.
Our results show that SOGDet consistently enhance the performance of three baseline methods in terms of nuScenes Detection Score (NDS) and mean Average Precision (mAP)
This indicates that the combination of 3D object detection and 3D semantic occupancy leads to a more comprehensive perception of the 3D environment, thereby aiding build more robust autonomous driving systems.
arXiv Detail & Related papers (2023-08-26T07:38:21Z) - 3D Data Augmentation for Driving Scenes on Camera [50.41413053812315]
We propose a 3D data augmentation approach termed Drive-3DAug, aiming at augmenting the driving scenes on camera in the 3D space.
We first utilize Neural Radiance Field (NeRF) to reconstruct the 3D models of background and foreground objects.
Then, augmented driving scenes can be obtained by placing the 3D objects with adapted location and orientation at the pre-defined valid region of backgrounds.
arXiv Detail & Related papers (2023-03-18T05:51:05Z) - PAI3D: Painting Adaptive Instance-Prior for 3D Object Detection [22.41785292720421]
Painting Adaptive Instance-prior for 3D object detection (PAI3D) is a sequential instance-level fusion framework.
It first extracts instance-level semantic information from images.
Extracted information, including objects categorical label, point-to-object membership and object position, are then used to augment each LiDAR point in the subsequent 3D detection network.
arXiv Detail & Related papers (2022-11-15T11:15:25Z) - RoIFusion: 3D Object Detection from LiDAR and Vision [7.878027048763662]
We propose a novel fusion algorithm by projecting a set of 3D Region of Interests (RoIs) from the point clouds to the 2D RoIs of the corresponding the images.
Our approach achieves state-of-the-art performance on the KITTI 3D object detection challenging benchmark.
arXiv Detail & Related papers (2020-09-09T20:23:27Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - 3D Object Detection Method Based on YOLO and K-Means for Image and Point
Clouds [1.9458156037869139]
Lidar based 3D object detection and classification tasks are essential for autonomous driving.
This paper proposes a 3D object detection method based on point cloud and image.
arXiv Detail & Related papers (2020-04-21T04:32:36Z) - Learning 2D-3D Correspondences To Solve The Blind Perspective-n-Point
Problem [98.92148855291363]
This paper proposes a deep CNN model which simultaneously solves for both 6-DoF absolute camera pose 2D--3D correspondences.
Tests on both real and simulated data have shown that our method substantially outperforms existing approaches.
arXiv Detail & Related papers (2020-03-15T04:17:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.