BEVContrast: Self-Supervision in BEV Space for Automotive Lidar Point
Clouds
- URL: http://arxiv.org/abs/2310.17281v1
- Date: Thu, 26 Oct 2023 10:02:33 GMT
- Title: BEVContrast: Self-Supervision in BEV Space for Automotive Lidar Point
Clouds
- Authors: Corentin Sautier, Gilles Puy, Alexandre Boulch, Renaud Marlet, Vincent
Lepetit
- Abstract summary: We present a surprisingly simple and efficient method for self-supervision of 3D backbone on automotive Lidar point clouds.
We design a contrastive loss between features of Lidar scans captured in the same scene.
Resulting cell-level representations offer a good trade-off between the point-level representations exploited in PointContrast and segment-level representations exploited in TARL.
- Score: 73.40883276013373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a surprisingly simple and efficient method for self-supervision of
3D backbone on automotive Lidar point clouds. We design a contrastive loss
between features of Lidar scans captured in the same scene. Several such
approaches have been proposed in the literature from PointConstrast, which uses
a contrast at the level of points, to the state-of-the-art TARL, which uses a
contrast at the level of segments, roughly corresponding to objects. While the
former enjoys a great simplicity of implementation, it is surpassed by the
latter, which however requires a costly pre-processing. In BEVContrast, we
define our contrast at the level of 2D cells in the Bird's Eye View plane.
Resulting cell-level representations offer a good trade-off between the
point-level representations exploited in PointContrast and segment-level
representations exploited in TARL: we retain the simplicity of PointContrast
(cell representations are cheap to compute) while surpassing the performance of
TARL in downstream semantic segmentation.
Related papers
- Improving Bird's Eye View Semantic Segmentation by Task Decomposition [42.57351039508863]
We decompose the original BEV segmentation task into two stages, namely BEV map reconstruction and RGB-BEV feature alignment.
Our approach simplifies the complexity of combining perception and generation into distinct steps, equipping the model to handle intricate and challenging scenes effectively.
arXiv Detail & Related papers (2024-04-02T13:19:45Z) - PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic
Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively.
Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system.
We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z) - Point Contrastive Prediction with Semantic Clustering for
Self-Supervised Learning on Point Cloud Videos [71.20376514273367]
We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data.
Our method outperforms supervised counterparts on a wide range of downstream tasks.
arXiv Detail & Related papers (2023-08-18T02:17:47Z) - Leveraging BEV Representation for 360-degree Visual Place Recognition [14.497501941931759]
This paper investigates the advantages of using Bird's Eye View representation in 360-degree visual place recognition (VPR)
We propose a novel network architecture that utilizes the BEV representation in feature extraction, feature aggregation, and vision-LiDAR fusion.
The proposed BEV-based method is evaluated in ablation and comparative studies on two datasets.
arXiv Detail & Related papers (2023-05-23T08:29:42Z) - A Cross-Scale Hierarchical Transformer with Correspondence-Augmented
Attention for inferring Bird's-Eye-View Semantic Segmentation [13.013635162859108]
Inferring BEV semantic segmentation conditioned on multi-camera-view images is a popular scheme in the community as cheap devices and real-time processing.
We propose a novel cross-scale hierarchical Transformer with correspondence-augmented attention for semantic segmentation inferring.
Our method has state-of-the-art performance in inferring BEV semantic segmentation conditioned on multi-camera-view images.
arXiv Detail & Related papers (2023-04-07T13:52:47Z) - TransUPR: A Transformer-based Uncertain Point Refiner for LiDAR Point
Cloud Semantic Segmentation [6.587305905804226]
We propose a transformer-based uncertain point refiner, i.e., TransUPR, to refine selected uncertain points in a learnable manner.
Our TransUPR achieves state-of-the-art performance, i.e., 68.2% mean Intersection over Union (mIoU) on the Semantic KITTI benchmark.
arXiv Detail & Related papers (2023-02-16T21:38:36Z) - RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in
Autonomous Driving [80.14669385741202]
Vision transformers (ViTs) have achieved state-of-the-art results in many image-based benchmarks.
ViTs are notoriously hard to train and require a lot of training data to learn powerful representations.
We show that our method, called RangeViT, outperforms existing projection-based methods on nuScenes and Semantic KITTI.
arXiv Detail & Related papers (2023-01-24T18:50:48Z) - Dual Adaptive Transformations for Weakly Supervised Point Cloud
Segmentation [78.6612285236938]
We propose a novel DAT (textbfDual textbfAdaptive textbfTransformations) model for weakly supervised point cloud segmentation.
We evaluate our proposed DAT model with two popular backbones on the large-scale S3DIS and ScanNet-V2 datasets.
arXiv Detail & Related papers (2022-07-19T05:43:14Z) - Contrastive Boundary Learning for Point Cloud Segmentation [81.7289734276872]
We propose a novel contrastive boundary learning framework for point cloud segmentation.
We experimentally show that CBL consistently improves different baselines and assists them to achieve compelling performance on boundaries.
arXiv Detail & Related papers (2022-03-10T10:08:09Z) - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View
Images [4.449481309681663]
We present the first end-to-end learning approach for directly predicting dense panoptic segmentation maps in the Bird's-Eye-View (BEV) maps.
Our architecture follows the top-down paradigm and incorporates a novel dense transformer module.
We derive a mathematical formulation for the sensitivity of the FV-BEV transformation which allows us to intelligently weight pixels in the BEV space.
arXiv Detail & Related papers (2021-08-06T17:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.