AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera
Joint Synthesis
- URL: http://arxiv.org/abs/2402.17483v1
- Date: Tue, 27 Feb 2024 13:08:47 GMT
- Title: AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera
Joint Synthesis
- Authors: Tao Tang, Guangrun Wang, Yixing Lao, Peng Chen, Jie Liu, Liang Lin,
Kaicheng Yu, Xiaodan Liang
- Abstract summary: Recently, there exist some methods exploring multiple modalities within a single field, aiming to share implicit features from different modalities to enhance reconstruction performance.
In this work, we conduct comprehensive analyses on the multimodal implicit field of LiDAR-camera joint synthesis, revealing the underlying issue lies in the misalignment of different sensors.
We introduce AlignMiF, a geometrically aligned multimodal implicit field with two proposed modules: Geometry-Aware Alignment (GAA) and Shared Geometry Initialization (SGI)
- Score: 98.3959800235485
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Neural implicit fields have been a de facto standard in novel view synthesis.
Recently, there exist some methods exploring fusing multiple modalities within
a single field, aiming to share implicit features from different modalities to
enhance reconstruction performance. However, these modalities often exhibit
misaligned behaviors: optimizing for one modality, such as LiDAR, can adversely
affect another, like camera performance, and vice versa. In this work, we
conduct comprehensive analyses on the multimodal implicit field of LiDAR-camera
joint synthesis, revealing the underlying issue lies in the misalignment of
different sensors. Furthermore, we introduce AlignMiF, a geometrically aligned
multimodal implicit field with two proposed modules: Geometry-Aware Alignment
(GAA) and Shared Geometry Initialization (SGI). These modules effectively align
the coarse geometry across different modalities, significantly enhancing the
fusion process between LiDAR and camera data. Through extensive experiments
across various datasets and scenes, we demonstrate the effectiveness of our
approach in facilitating better interaction between LiDAR and camera modalities
within a unified neural field. Specifically, our proposed AlignMiF, achieves
remarkable improvement over recent implicit fusion methods (+2.01 and +3.11
image PSNR on the KITTI-360 and Waymo datasets) and consistently surpasses
single modality performance (13.8% and 14.2% reduction in LiDAR Chamfer
Distance on the respective datasets).
Related papers
- LiOn-XA: Unsupervised Domain Adaptation via LiDAR-Only Cross-Modal Adversarial Training [61.26381389532653]
LiOn-XA is an unsupervised domain adaptation (UDA) approach that combines LiDAR-Only Cross-Modal (X) learning with Adversarial training for 3D LiDAR point cloud semantic segmentation.
Our experiments on 3 real-to-real adaptation scenarios demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-21T09:50:17Z) - Explore the LiDAR-Camera Dynamic Adjustment Fusion for 3D Object Detection [38.809645060899065]
Camera and LiDAR serve as informative sensors for accurate and robust autonomous driving systems.
These sensors often exhibit heterogeneous natures, resulting in distributional modality gaps.
We introduce a dynamic adjustment technology aimed at aligning modal distributions and learning effective modality representations.
arXiv Detail & Related papers (2024-07-22T02:42:15Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and
Semantic-Aware Alignment [63.83894701779067]
We propose LCPS, the first LiDAR-Camera Panoptic network.
In our approach, we conduct LiDAR-Camera fusion in three stages.
Our fusion strategy improves about 6.9% PQ performance over the LiDAR-only baseline on NuScenes dataset.
arXiv Detail & Related papers (2023-08-03T10:57:58Z) - Towards Scalable Multi-View Reconstruction of Geometry and Materials [27.660389147094715]
We propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes.
The input are high-resolution RGBD images captured by a mobile, hand-held capture system with point lights for active illumination.
arXiv Detail & Related papers (2023-06-06T15:07:39Z) - MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving [15.36416000750147]
We propose a multi-modal 3D semantic segmentation model (MSeg3D) with joint intra-modal feature extraction and inter-modal feature fusion.
MSeg3D still shows robustness and improves the LiDAR-only baseline.
arXiv Detail & Related papers (2023-03-15T13:13:03Z) - AVIDA: Alternating method for Visualizing and Integrating Data [1.6637373649145604]
AVIDA is a framework for simultaneously performing data alignment and dimension reduction.
We show that AVIDA correctly aligns high-dimensional datasets without common features.
In general applications, other methods can be used for the alignment and dimension reduction modules.
arXiv Detail & Related papers (2022-05-31T22:36:10Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Know Your Surroundings: Panoramic Multi-Object Tracking by Multimodality
Collaboration [56.01625477187448]
We propose a MultiModality PAnoramic multi-object Tracking framework (MMPAT)
It takes both 2D panorama images and 3D point clouds as input and then infers target trajectories using the multimodality data.
We evaluate the proposed method on the JRDB dataset, where the MMPAT achieves the top performance in both the detection and tracking tasks.
arXiv Detail & Related papers (2021-05-31T03:16:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.