Related papers: MaskedFusion360: Reconstruct LiDAR Data by Querying Camera Features

MaskedFusion360: Reconstruct LiDAR Data by Querying Camera Features

URL: http://arxiv.org/abs/2306.07087v1
Date: Mon, 12 Jun 2023 13:01:33 GMT
Title: MaskedFusion360: Reconstruct LiDAR Data by Querying Camera Features
Authors: Royden Wagner, Marvin Klemp, Carlos Fernandez Lopez
Abstract summary: In self-driving applications, LiDAR data provides accurate information about distances in 3D but lacks the semantic richness of camera data. We introduce a novel self-supervised method to fuse LiDAR and camera data for self-driving applications.
Score: 11.28654979274464
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In self-driving applications, LiDAR data provides accurate information about distances in 3D but lacks the semantic richness of camera data. Therefore, state-of-the-art methods for perception in urban scenes fuse data from both sensor types. In this work, we introduce a novel self-supervised method to fuse LiDAR and camera data for self-driving applications. We build upon masked autoencoders (MAEs) and train deep learning models to reconstruct masked LiDAR data from fused LiDAR and camera features. In contrast to related methods that use birds-eye-view representations, we fuse features from dense spherical LiDAR projections and features from fish-eye camera crops with a similar field of view. Therefore, we reduce the learned spatial transformations to moderate perspective transformations and do not require additional modules to generate dense LiDAR representations. Code is available at: https://github.com/KIT-MRT/masked-fusion-360

Related papers

LiDAR View Synthesis for Robust Vehicle Navigation Without Expert Labels [50.40632021583213]
We propose synthesizing additional LiDAR point clouds from novel viewpoints without physically driving at dangerous positions. We train a deep learning model, which takes a LiDAR scan as input and predicts the future trajectory as output. A waypoint controller is then applied to this predicted trajectory to determine the throttle and steering labels of the ego-vehicle.
arXiv Detail & Related papers (2023-08-02T20:46:43Z)
UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input [51.150605800173366]
UnLoc is a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions. Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets.
arXiv Detail & Related papers (2023-07-03T04:10:55Z)
BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving. Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation. We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z)
SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detection [14.706717531900708]
LiDAR and camera are two essential sensors for 3D object detection in autonomous driving. Recent methods focus on point-level fusion which paints the LiDAR point cloud with camera features in the perspective view. We present SemanticBEVFusion to deeply fuse camera features with LiDAR features in a unified BEV representation.
arXiv Detail & Related papers (2022-12-09T05:48:58Z)
LAPTNet: LiDAR-Aided Perspective Transform Network [0.0]
We present an architecture that fuses LiDAR and camera information to generate semantic grids. LAPTNet is able to associate features in the camera plane to the bird's eye view without having to predict any depth information about the scene.
arXiv Detail & Related papers (2022-11-14T18:56:02Z)
Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation [23.666607237164186]
We propose a novel deep neural network exploiting both spatial-temporal information and different representation modalities of LiDAR scans to improve LiDAR-MOS performance. Specifically, we first use a range image-based dual-branch structure to separately deal with spatial and temporal information. We also use a point refinement module via 3D sparse convolution to fuse the information from both LiDAR range image and point cloud representations.
arXiv Detail & Related papers (2022-07-05T17:59:17Z)
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR. fusing these two modalities can significantly boost the performance of 3D perception models. We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z)
BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework [20.842800465250775]
Current methods rely on point clouds from the LiDAR sensor as queries to leverage the feature from the image space. We propose a surprisingly simple yet novel fusion framework, dubbed BEVFusion, whose camera stream does not depend on the input of LiDAR data. We empirically show that our framework surpasses the state-of-the-art methods under the normal training settings.
arXiv Detail & Related papers (2022-05-27T06:58:30Z)
MonoDistill: Learning Spatial Features for Monocular 3D Object Detection [80.74622486604886]
We propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors. We use the resulting data to train a 3D detector with the same architecture as the baseline model. Experimental results show that the proposed method can significantly boost the performance of the baseline model.
arXiv Detail & Related papers (2022-01-26T09:21:41Z)
SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural Networks [81.64530401885476]
We propose a self-supervised LiDAR odometry method, dubbed SelfVoxeLO, to tackle these two difficulties. Specifically, we propose a 3D convolution network to process the raw LiDAR data directly, which extracts features that better encode the 3D geometric patterns. We evaluate our method's performances on two large-scale datasets, i.e., KITTI and Apollo-SouthBay.
arXiv Detail & Related papers (2020-10-19T09:23:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.