MaskedFusion360: Reconstruct LiDAR Data by Querying Camera Features
- URL: http://arxiv.org/abs/2306.07087v1
- Date: Mon, 12 Jun 2023 13:01:33 GMT
- Title: MaskedFusion360: Reconstruct LiDAR Data by Querying Camera Features
- Authors: Royden Wagner, Marvin Klemp, Carlos Fernandez Lopez
- Abstract summary: In self-driving applications, LiDAR data provides accurate information about distances in 3D but lacks the semantic richness of camera data.
We introduce a novel self-supervised method to fuse LiDAR and camera data for self-driving applications.
- Score: 11.28654979274464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In self-driving applications, LiDAR data provides accurate information about
distances in 3D but lacks the semantic richness of camera data. Therefore,
state-of-the-art methods for perception in urban scenes fuse data from both
sensor types. In this work, we introduce a novel self-supervised method to fuse
LiDAR and camera data for self-driving applications. We build upon masked
autoencoders (MAEs) and train deep learning models to reconstruct masked LiDAR
data from fused LiDAR and camera features. In contrast to related methods that
use birds-eye-view representations, we fuse features from dense spherical LiDAR
projections and features from fish-eye camera crops with a similar field of
view. Therefore, we reduce the learned spatial transformations to moderate
perspective transformations and do not require additional modules to generate
dense LiDAR representations. Code is available at:
https://github.com/KIT-MRT/masked-fusion-360
Related papers
- LiDAR View Synthesis for Robust Vehicle Navigation Without Expert Labels [50.40632021583213]
We propose synthesizing additional LiDAR point clouds from novel viewpoints without physically driving at dangerous positions.
We train a deep learning model, which takes a LiDAR scan as input and predicts the future trajectory as output.
A waypoint controller is then applied to this predicted trajectory to determine the throttle and steering labels of the ego-vehicle.
arXiv Detail & Related papers (2023-08-02T20:46:43Z) - UnLoc: A Universal Localization Method for Autonomous Vehicles using
LiDAR, Radar and/or Camera Input [51.150605800173366]
UnLoc is a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions.
Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets.
arXiv Detail & Related papers (2023-07-03T04:10:55Z) - BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye
View Representation for 3D Object Detection [14.706717531900708]
LiDAR and camera are two essential sensors for 3D object detection in autonomous driving.
Recent methods focus on point-level fusion which paints the LiDAR point cloud with camera features in the perspective view.
We present SemanticBEVFusion to deeply fuse camera features with LiDAR features in a unified BEV representation.
arXiv Detail & Related papers (2022-12-09T05:48:58Z) - LAPTNet: LiDAR-Aided Perspective Transform Network [0.0]
We present an architecture that fuses LiDAR and camera information to generate semantic grids.
LAPTNet is able to associate features in the camera plane to the bird's eye view without having to predict any depth information about the scene.
arXiv Detail & Related papers (2022-11-14T18:56:02Z) - Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving
Object Segmentation [23.666607237164186]
We propose a novel deep neural network exploiting both spatial-temporal information and different representation modalities of LiDAR scans to improve LiDAR-MOS performance.
Specifically, we first use a range image-based dual-branch structure to separately deal with spatial and temporal information.
We also use a point refinement module via 3D sparse convolution to fuse the information from both LiDAR range image and point cloud representations.
arXiv Detail & Related papers (2022-07-05T17:59:17Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework [20.842800465250775]
Current methods rely on point clouds from the LiDAR sensor as queries to leverage the feature from the image space.
We propose a surprisingly simple yet novel fusion framework, dubbed BEVFusion, whose camera stream does not depend on the input of LiDAR data.
We empirically show that our framework surpasses the state-of-the-art methods under the normal training settings.
arXiv Detail & Related papers (2022-05-27T06:58:30Z) - MonoDistill: Learning Spatial Features for Monocular 3D Object Detection [80.74622486604886]
We propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors.
We use the resulting data to train a 3D detector with the same architecture as the baseline model.
Experimental results show that the proposed method can significantly boost the performance of the baseline model.
arXiv Detail & Related papers (2022-01-26T09:21:41Z) - SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural
Networks [81.64530401885476]
We propose a self-supervised LiDAR odometry method, dubbed SelfVoxeLO, to tackle these two difficulties.
Specifically, we propose a 3D convolution network to process the raw LiDAR data directly, which extracts features that better encode the 3D geometric patterns.
We evaluate our method's performances on two large-scale datasets, i.e., KITTI and Apollo-SouthBay.
arXiv Detail & Related papers (2020-10-19T09:23:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.