Related papers: Multi-LVI-SAM: A Robust LiDAR-Visual-Inertial Odometry for Multiple Fisheye Cameras

Multi-LVI-SAM: A Robust LiDAR-Visual-Inertial Odometry for Multiple Fisheye Cameras

URL: http://arxiv.org/abs/2509.05740v1
Date: Sat, 06 Sep 2025 15:06:55 GMT
Title: Multi-LVI-SAM: A Robust LiDAR-Visual-Inertial Odometry for Multiple Fisheye Cameras
Authors: Xinyu Zhang, Kai Huang, Junqiao Zhao, Zihan Yuan, Tiantian Feng,
Abstract summary: We propose a multi-camera LiDAR-visual-inertial odometry framework, Multi-LVI-SAM, which fuses data from multiple fisheye cameras, LiDAR and inertial sensors for highly accurate and robust state estimation.<n>To enable efficient and consistent integration of visual information from multiple fisheye cameras, we introduce a panoramic visual feature model that unifies multi-camera observations into a single representation.
Score: 24.95534498798919
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a multi-camera LiDAR-visual-inertial odometry framework, Multi-LVI-SAM, which fuses data from multiple fisheye cameras, LiDAR and inertial sensors for highly accurate and robust state estimation. To enable efficient and consistent integration of visual information from multiple fisheye cameras, we introduce a panoramic visual feature model that unifies multi-camera observations into a single representation. The panoramic model serves as a global geometric optimization framework that consolidates multi-view constraints, enabling seamless loop closure and global pose optimization, while simplifying system design by avoiding redundant handling of individual cameras. To address the triangulation inconsistency caused by the misalignment between each camera's frame and the panoramic model's frame, we propose an extrinsic compensation method. This method improves feature consistency across views and significantly reduces triangulation and optimization errors, leading to more accurate pose estimation. We integrate the panoramic visual feature model into a tightly coupled LiDAR-visual-inertial system based on a factor graph. Extensive experiments on public datasets demonstrate that the panoramic visual feature model enhances the quality and consistency of multi-camera constraints, resulting in higher accuracy and robustness than existing multi-camera LiDAR-visual-inertial systems.

Related papers

ViewMorpher3D: A 3D-aware Diffusion Framework for Multi-Camera Novel View Synthesis in Autonomous Driving [20.935790354765604]
We introduce ViewMorpher3D, a multi-view image enhancement framework based on image diffusion models.<n>Unlike single-view approaches, ViewMorpher3D jointly processes a set of rendered views conditioned on camera poses, 3D geometric priors, and temporally adjacent or spatially overlapping reference views.<n>Our framework accommodates variable numbers of cameras and flexible reference/target view configurations, making it adaptable to diverse sensor setups.
arXiv Detail & Related papers (2026-01-12T13:44:14Z)
Visual Odometry with Transformers [68.453547770334]
We introduce Visual odometry Transformer (VoT), which processes sequences of monocular frames by extracting features.<n>Unlike prior methods, VoT directly predicts camera motion without estimating dense geometry and relies solely on camera poses for supervision.<n>VoT scales effectively with larger datasets, benefits substantially from stronger pre-trained backbones, generalizes across diverse camera motions and calibration settings, and outperforms traditional methods while running more than 3 times faster.
arXiv Detail & Related papers (2025-10-02T17:00:14Z)
MapAnything: Universal Feed-Forward Metric 3D Reconstruction [63.79151976126576]
MapAnything ingests one or more images along with optional geometric inputs such as camera intrinsics, poses, depth, or partial reconstructions.<n>It then directly regresses the metric 3D scene geometry and cameras.<n>MapAnything addresses a broad range of 3D vision tasks in a single feed-forward pass.
arXiv Detail & Related papers (2025-09-16T18:00:14Z)
CVD-SfM: A Cross-View Deep Front-end Structure-from-Motion System for Sparse Localization in Multi-Altitude Scenes [0.7623023317942882]
We present a novel multi-altitude camera pose estimation system, addressing the challenges of robust and accurate localization across varied altitudes.<n>The system effectively handles diverse environmental conditions and viewpoint variations by integrating the cross-view transformer, deep features, and structure-from-motion.
arXiv Detail & Related papers (2025-08-03T22:11:48Z)
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction [69.63414788486578]
FreeSplatter is a scalable feed-forward framework that generates high-quality 3D Gaussians from uncalibrated sparse-view images.<n>Our approach employs a streamlined transformer architecture where self-attention blocks facilitate information exchange.<n>We develop two specialized variants--for object-centric and scene-level reconstruction--trained on comprehensive datasets.
arXiv Detail & Related papers (2024-12-12T18:52:53Z)
Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration [34.18403601269181]
DM-Calib is a diffusion-based approach for estimating pinhole camera intrinsic parameters from a single input image.<n>We introduce a new image-based representation, termed Camera Image, which losslessly encodes the numerical camera intrinsics.<n>By fine-tuning a stable diffusion model to generate a Camera Image from a single RGB input, we can extract camera intrinsics via a RANSAC operation.
arXiv Detail & Related papers (2024-11-26T09:04:37Z)
Scaling Multi-Camera 3D Object Detection through Weak-to-Strong Eliciting [32.66151412557986]
We present a weak-to-strong eliciting framework aimed at enhancing surround refinement while maintaining robust monocular perception. Our framework employs weakly tuned experts trained on distinct subsets, and each is inherently biased toward specific camera configurations and scenarios. For MC3D-Det joint training, the elaborate dataset merge strategy is designed to solve the problem of inconsistent camera numbers and camera parameters.
arXiv Detail & Related papers (2024-04-10T03:11:10Z)
Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method. We embed multi-scale complementary features from the same view position into a set of nodes. By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z)
Towards Scalable Multi-View Reconstruction of Geometry and Materials [27.660389147094715]
We propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes. The input are high-resolution RGBD images captured by a mobile, hand-held capture system with point lights for active illumination.
arXiv Detail & Related papers (2023-06-06T15:07:39Z)
Gait Recognition in Large-scale Free Environment via Single LiDAR [35.684257181154905]
LiDAR's ability to capture depth makes it pivotal for robotic perception and holds promise for real-world gait recognition. We present the Hierarchical Multi-representation Feature Interaction Network (HMRNet) for robust gait recognition. To facilitate LiDAR-based gait recognition research, we introduce FreeGait, a comprehensive gait dataset from large-scale, unconstrained settings.
arXiv Detail & Related papers (2022-11-22T16:05:58Z)
A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving. We present SimMOD, a Simple baseline for Multi-camera Object Detection. We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z)
SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras. Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views. In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z)
Redesigning SLAM for Arbitrary Multi-Camera Systems [51.81798192085111]
Adding more cameras to SLAM systems improves robustness and accuracy but complicates the design of the visual front-end significantly. In this work, we aim at an adaptive SLAM system that works for arbitrary multi-camera setups. We adapt a state-of-the-art visual-inertial odometry with these modifications, and experimental results show that the modified pipeline can adapt to a wide range of camera setups.
arXiv Detail & Related papers (2020-03-04T11:44:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.