Related papers: Contrastive Learning of Features between Images and LiDAR

Contrastive Learning of Features between Images and LiDAR

URL: http://arxiv.org/abs/2206.12071v1
Date: Fri, 24 Jun 2022 04:35:23 GMT
Title: Contrastive Learning of Features between Images and LiDAR
Authors: Peng Jiang, Srikanth Saripalli
Abstract summary: This work treats learning cross-modal features as a dense contrastive learning problem. To learn good features and not lose generality, we developed a variant of widely used PointNet++ architecture for images. We show that our models indeed learn information from both images as well as LiDAR by visualizing the features.
Score: 18.211513930388417
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Image and Point Clouds provide different information for robots. Finding the correspondences between data from different sensors is crucial for various tasks such as localization, mapping, and navigation. Learning-based descriptors have been developed for single sensors; there is little work on cross-modal features. This work treats learning cross-modal features as a dense contrastive learning problem. We propose a Tuple-Circle loss function for cross-modality feature learning. Furthermore, to learn good features and not lose generality, we developed a variant of widely used PointNet++ architecture for point cloud and U-Net CNN architecture for images. Moreover, we conduct experiments on a real-world dataset to show the effectiveness of our loss function and network structure. We show that our models indeed learn information from both images as well as LiDAR by visualizing the features.

Related papers

Towards Fusing Point Cloud and Visual Representations for Imitation Learning [57.886331184389604]
We propose FPV-Net, a novel imitation learning method that effectively combines the strengths of both point cloud and RGB modalities. Our method conditions the point-cloud encoder on global and local image tokens using adaptive layer norm conditioning.
arXiv Detail & Related papers (2025-02-17T20:46:54Z)
Why and How: Knowledge-Guided Learning for Cross-Spectral Image Patch Matching [7.699066648931588]
Cross-spectral image patch matching based on feature relation learning has attracted extensive attention. We make the first attempt to explore a stable and efficient bridge between descriptor learning and metric learning. We construct a knowledge-guided learning network (KGL-Net) which achieves amazing performance improvements.
arXiv Detail & Related papers (2024-12-15T11:59:23Z)
Learning Object-Centric Representation via Reverse Hierarchy Guidance [73.05170419085796]
Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes. RHGNet introduces a top-down pathway that works in different ways in the training and inference processes. Our model achieves SOTA performance on several commonly used datasets.
arXiv Detail & Related papers (2024-05-17T07:48:27Z)
HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation [106.09886920774002]
We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network. Our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes.
arXiv Detail & Related papers (2024-03-18T14:18:08Z)
Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes. We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data. We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z)
Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching [58.10418136917358]
Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic training. Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks. We learn a structured cross-modality matching solver to represent 3D features via a different latent pixel space.
arXiv Detail & Related papers (2023-12-07T05:46:10Z)
Cross-Modal Information-Guided Network using Contrastive Learning for Point Cloud Registration [17.420425069785946]
We present a novel Cross-Modal Information-Guided Network (CMIGNet) for point cloud registration. We first incorporate the projected images from the point clouds and fuse the cross-modal features using the attention mechanism. We employ two contrastive learning strategies, namely overlapping contrastive learning and cross-modal contrastive learning.
arXiv Detail & Related papers (2023-11-02T12:56:47Z)
Let Images Give You More:Point Cloud Cross-Modal Training for Shape Analysis [43.13887916301742]
This paper introduces a simple but effective point cloud cross-modality training (PointCMT) strategy to boost point cloud analysis. To effectively acquire auxiliary knowledge from view images, we develop a teacher-student framework and formulate the cross modal learning as a knowledge distillation problem. We verify significant gains on various datasets using appealing backbones, i.e., equipped with PointCMT, PointNet++ and PointMLP.
arXiv Detail & Related papers (2022-10-09T09:35:22Z)
SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations [85.38562724999898]
We propose a 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU. Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module and an inter-modal feature interaction module. To the best of our knowledge, this is the first study to explore contrastive learning pre-training strategies for outdoor multi-modal datasets.
arXiv Detail & Related papers (2021-12-09T03:27:00Z)
PGGANet: Pose Guided Graph Attention Network for Person Re-identification [0.0]
Person re-identification (ReID) aims at retrieving a person from images captured by different cameras. It has been proved that using local features together with global feature of person image could help to give robust feature representations for person retrieval. We propose a pose guided graph attention network, a multi-branch architecture consisting of one branch for global feature, one branch for mid-granular body features and one branch for fine-granular key point features.
arXiv Detail & Related papers (2021-11-29T09:47:39Z)
AugNet: End-to-End Unsupervised Visual Representation Learning with Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures. Our experiments demonstrate that the method is able to represent the image in low dimensional space. Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z)
Data Augmentation for Object Detection via Differentiable Neural Rendering [71.00447761415388]
It is challenging to train a robust object detector when annotated data is scarce. Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data. We introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
arXiv Detail & Related papers (2021-03-04T06:31:06Z)
Learning to Focus: Cascaded Feature Matching Network for Few-shot Image Recognition [38.49419948988415]
Deep networks can learn to accurately recognize objects of a category by training on a large number of images. A meta-learning challenge known as a low-shot image recognition task comes when only a few images with annotations are available for learning a recognition model for one category. Our method, called Cascaded Feature Matching Network (CFMN), is proposed to solve this problem. Experiments for few-shot learning on two standard datasets, emphminiImageNet and Omniglot, have confirmed the effectiveness of our method.
arXiv Detail & Related papers (2021-01-13T11:37:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.