Related papers: Online,Target-Free LiDAR-Camera Extrinsic Calibration via Cross-Modal Mask Matching

Online,Target-Free LiDAR-Camera Extrinsic Calibration via Cross-Modal Mask Matching

URL: http://arxiv.org/abs/2404.18083v2
Date: Thu, 20 Jun 2024 03:20:44 GMT
Title: Online,Target-Free LiDAR-Camera Extrinsic Calibration via Cross-Modal Mask Matching
Authors: Zhiwei Huang, Yikang Zhang, Qijun Chen, Rui Fan,
Abstract summary: We introduce a novel framework known as MIAS-LCEC, provide an open-source versatile calibration toolbox, and publish three real-world datasets. The cornerstone of our framework and toolbox is the cross-modal mask matching (C3M) algorithm, developed based on a state-of-the-art (SoTA) LVM.
Score: 16.13886663417327
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LiDAR-camera extrinsic calibration (LCEC) is crucial for data fusion in intelligent vehicles. Offline, target-based approaches have long been the preferred choice in this field. However, they often demonstrate poor adaptability to real-world environments. This is largely because extrinsic parameters may change significantly due to moderate shocks or during extended operations in environments with vibrations. In contrast, online, target-free approaches provide greater adaptability yet typically lack robustness, primarily due to the challenges in cross-modal feature matching. Therefore, in this article, we unleash the full potential of large vision models (LVMs), which are emerging as a significant trend in the fields of computer vision and robotics, especially for embodied artificial intelligence, to achieve robust and accurate online, target-free LCEC across a variety of challenging scenarios. Our main contributions are threefold: we introduce a novel framework known as MIAS-LCEC, provide an open-source versatile calibration toolbox with an interactive visualization interface, and publish three real-world datasets captured from various indoor and outdoor environments. The cornerstone of our framework and toolbox is the cross-modal mask matching (C3M) algorithm, developed based on a state-of-the-art (SoTA) LVM and capable of generating sufficient and reliable matches. Extensive experiments conducted on these real-world datasets demonstrate the robustness of our approach and its superior performance compared to SoTA methods, particularly for the solid-state LiDARs with super-wide fields of view.

Related papers

MultiEditor: Controllable Multimodal Object Editing for Driving Scenarios Using 3D Gaussian Splatting Priors [4.4714079610450765]
MultiEditor is a dual-branch latent diffusion framework designed to edit images and LiDAR point clouds jointly.<n>We propose a depth-guided deformable cross-modality condition module that adaptively enables mutual guidance between modalities.<n>Experiments demonstrate that MultiEditor achieves superior performance in visual and geometric fidelity, editing controllability, and cross-modality consistency.
arXiv Detail & Related papers (2025-07-29T14:42:52Z)
Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection [53.2590751089607]
Real-IAD D3 is a high-precision multimodal dataset that incorporates an additional pseudo3D modality generated through photometric stereo. We introduce an effective approach that integrates RGB, point cloud, and pseudo-3D depth information to leverage the complementary strengths of each modality. Our experiments highlight the importance of these modalities in boosting detection robustness and overall IAD performance.
arXiv Detail & Related papers (2025-04-19T08:05:47Z)
CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond [45.996901339560566]
Infrared and visible image fusion (IVIF) is increasingly applied in critical fields such as video surveillance and autonomous driving systems. We propose an infrared-visible fusion framework based on Multi-View Augmentation. Our approach significantly enhances the reliability and stability of IVIF tasks in practical applications.
arXiv Detail & Related papers (2025-02-20T12:19:30Z)
Environment-Driven Online LiDAR-Camera Extrinsic Calibration [8.096139287822997]
EdO-LCEC is the first environment-driven, online calibration approach that achieves human-like adaptability. Inspired by the human system, EdO-LCEC incorporates a generalizable scene discriminator to actively interpret environmental conditions. Our approach formulates the calibration process as a spatial-temporal joint optimization problem.
arXiv Detail & Related papers (2025-02-02T13:52:35Z)
LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving [52.83707400688378]
LargeAD is a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets. Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples. Our approach delivers significant performance improvements over state-of-the-art methods in both linear probing and fine-tuning tasks for both LiDAR-based segmentation and object detection.
arXiv Detail & Related papers (2025-01-07T18:59:59Z)
MObI: Multimodal Object Inpainting Using Diffusion Models [52.07640413626605]
This paper introduces MObI, a novel framework for Multimodal Object Inpainting. Using a single reference RGB image, MObI enables objects to be seamlessly inserted into existing multimodal scenes. Unlike traditional inpainting methods that rely solely on edit masks, our 3D bounding box conditioning gives objects accurate spatial positioning and realistic scaling.
arXiv Detail & Related papers (2025-01-06T17:43:26Z)
Towards 3D Semantic Scene Completion for Autonomous Driving: A Meta-Learning Framework Empowered by Deformable Large-Kernel Attention and Mamba Model [1.6835437621159244]
We introduce MetaSSC, a novel meta-learning-based framework for semantic scene completion (SSC) Our approach begins with a voxel-based semantic segmentation (SS) pretraining task, aimed at exploring the semantics and geometry of incomplete regions. Using simulated cooperative perception datasets, we supervise the perception training of a single vehicle using aggregated sensor data. This meta-knowledge is then adapted to the target domain through a dual-phase training strategy, enabling efficient deployment.
arXiv Detail & Related papers (2024-11-06T05:11:25Z)
Scalable Offline Reinforcement Learning for Mean Field Games [6.8267158622784745]
Off-MMD is a novel mean-field RL algorithm that approximates equilibrium policies in mean-field games using purely offline data. Our algorithm scales to complex environments and demonstrates strong performance on benchmark tasks like crowd exploration or navigation.
arXiv Detail & Related papers (2024-10-23T14:16:34Z)
RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR. Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer. Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z)
Unleashing the Potential of Mamba: Boosting a LiDAR 3D Sparse Detector by Using Cross-Model Knowledge Distillation [22.653014803666668]
We propose a Faster LiDAR 3D object detection framework, called FASD, which implements heterogeneous model distillation by adaptively uniform cross-model voxel features. We aim to distill the transformer's capacity for high-performance sequence modeling into Mamba models with low FLOPs, achieving a significant improvement in accuracy through knowledge transfer. We evaluated the framework on datasets and nuScenes, achieving a 4x reduction in resource consumption and a 1-2% performance improvement over the current SoTA methods.
arXiv Detail & Related papers (2024-09-17T09:30:43Z)
Explore the LiDAR-Camera Dynamic Adjustment Fusion for 3D Object Detection [38.809645060899065]
Camera and LiDAR serve as informative sensors for accurate and robust autonomous driving systems. These sensors often exhibit heterogeneous natures, resulting in distributional modality gaps. We introduce a dynamic adjustment technology aimed at aligning modal distributions and learning effective modality representations.
arXiv Detail & Related papers (2024-07-22T02:42:15Z)
SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation [55.87169702896249]
Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift. We propose a framework to evaluate DA methods and present a fair evaluation of existing shallow algorithms, including reweighting, mapping, and subspace alignment. Our benchmark highlights the importance of realistic validation and provides practical guidance for real-life applications.
arXiv Detail & Related papers (2024-07-16T12:52:29Z)
Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving [58.16024314532443]
We introduce LaserMix++, a framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to assist data-efficient learning. Results demonstrate that LaserMix++ outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations. This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.
arXiv Detail & Related papers (2024-05-08T17:59:53Z)
LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment [59.320414108383055]
We present LiveHPS, a novel single-LiDAR-based approach for scene-level human pose and shape estimation. We propose a huge human motion dataset, named FreeMotion, which is collected in various scenarios with diverse human poses.
arXiv Detail & Related papers (2024-02-27T03:08:44Z)
Gait Recognition in Large-scale Free Environment via Single LiDAR [35.684257181154905]
LiDAR's ability to capture depth makes it pivotal for robotic perception and holds promise for real-world gait recognition. We present the Hierarchical Multi-representation Feature Interaction Network (HMRNet) for robust gait recognition. To facilitate LiDAR-based gait recognition research, we introduce FreeGait, a comprehensive gait dataset from large-scale, unconstrained settings.
arXiv Detail & Related papers (2022-11-22T16:05:58Z)
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR. fusing these two modalities can significantly boost the performance of 3D perception models. We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z)
Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data. We first train a scale-aware disparity network using both monocular real images and stereo virtual data. The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.