Related papers: MRASfM: Multi-Camera Reconstruction and Aggregation through Structure-from-Motion in Driving Scenes

MRASfM: Multi-Camera Reconstruction and Aggregation through Structure-from-Motion in Driving Scenes

URL: http://arxiv.org/abs/2510.15467v1
Date: Fri, 17 Oct 2025 09:20:59 GMT
Title: MRASfM: Multi-Camera Reconstruction and Aggregation through Structure-from-Motion in Driving Scenes
Authors: Lingfeng Xuan, Chang Nie, Yiqing Xu, Zhe Liu, Yanzi Miao, Hesheng Wang,
Abstract summary: We propose a Multi-camera Reconstruction and Aggregation Structure-from-Motion (MRASfM) framework specifically designed for driving scenes.<n>MRASfM enhances the reliability of camera pose estimation by leveraging the fixed spatial relationships within the multi-camera system during the registration process.<n>Treating the multi-camera set as a single unit in Bundle Adjustment (BA) helps reduce optimization variables to boost efficiency.
Score: 20.625799448587703
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Structure from Motion (SfM) estimates camera poses and reconstructs point clouds, forming a foundation for various tasks. However, applying SfM to driving scenes captured by multi-camera systems presents significant difficulties, including unreliable pose estimation, excessive outliers in road surface reconstruction, and low reconstruction efficiency. To address these limitations, we propose a Multi-camera Reconstruction and Aggregation Structure-from-Motion (MRASfM) framework specifically designed for driving scenes. MRASfM enhances the reliability of camera pose estimation by leveraging the fixed spatial relationships within the multi-camera system during the registration process. To improve the quality of road surface reconstruction, our framework employs a plane model to effectively remove erroneous points from the triangulated road surface. Moreover, treating the multi-camera set as a single unit in Bundle Adjustment (BA) helps reduce optimization variables to boost efficiency. In addition, MRASfM achieves multi-scene aggregation through scene association and assembly modules in a coarse-to-fine fashion. We deployed multi-camera systems on actual vehicles to validate the generalizability of MRASfM across various scenes and its robustness in challenging conditions through real-world applications. Furthermore, large-scale validation results on public datasets show the state-of-the-art performance of MRASfM, achieving 0.124 absolute pose error on the nuScenes dataset.

Related papers

XYZCylinder: Feedforward Reconstruction for Driving Scenes Based on A Unified Cylinder Lifting Method [27.213339282749885]
We propose textbfXYZ Cylinder, a feedforward model based on a unified cylinder lifting method.<n>Specifically, we design a Unified Cylinder Camera Modeling (UCCM) strategy, which avoids the learning of viewpoint-dependent spatial correspondence.<n>To improve the reconstruction accuracy, we propose a hybrid representation with several dedicated modules based on newly designed Cylinder Plane Feature Group.
arXiv Detail & Related papers (2025-10-09T06:58:03Z)
CVD-SfM: A Cross-View Deep Front-end Structure-from-Motion System for Sparse Localization in Multi-Altitude Scenes [0.7623023317942882]
We present a novel multi-altitude camera pose estimation system, addressing the challenges of robust and accurate localization across varied altitudes.<n>The system effectively handles diverse environmental conditions and viewpoint variations by integrating the cross-view transformer, deep features, and structure-from-motion.
arXiv Detail & Related papers (2025-08-03T22:11:48Z)
MGSfM: Multi-Camera Geometry Driven Global Structure-from-Motion [13.24058110580706]
We propose a novel global motion averaging framework for multi-camera systems.<n>Our system matches or exceeds incremental SfM accuracy while significantly improving efficiency.
arXiv Detail & Related papers (2025-07-04T05:25:00Z)
PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments [73.80718037070773]
We present the multi-modal Pedestrian-Focused Scene dataset, rigorously annotated in semi-structured scenes with the format of nuScenes.<n>We also propose a novel Hybrid Multi-Scale Fusion Network (HMFN) to detect pedestrians in densely populated and occluded scenarios.
arXiv Detail & Related papers (2025-02-21T09:57:53Z)
MC-BEVRO: Multi-Camera Bird Eye View Road Occupancy Detection for Traffic Monitoring [23.396192711865147]
Single camera 3D perception for traffic monitoring faces significant challenges due to occlusion and limited field of view.<n>This paper introduces a novel Bird's-Eye-View road occupancy detection framework that leverages multiple roadside cameras.
arXiv Detail & Related papers (2025-02-16T22:03:03Z)
Reconstructing People, Places, and Cameras [57.81696692335401]
"Humans and Structure from Motion" (HSfM) is a method for jointly reconstructing multiple human meshes, scene point clouds, and camera parameters in a metric world coordinate system.<n>Our results show that incorporating human data into the SfM pipeline improves camera pose estimation.
arXiv Detail & Related papers (2024-12-23T18:58:34Z)
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction [69.63414788486578]
FreeSplatter is a scalable feed-forward framework that generates high-quality 3D Gaussians from uncalibrated sparse-view images.<n>Our approach employs a streamlined transformer architecture where self-attention blocks facilitate information exchange.<n>We develop two specialized variants--for object-centric and scene-level reconstruction--trained on comprehensive datasets.
arXiv Detail & Related papers (2024-12-12T18:52:53Z)
RoMo: Robust Motion Segmentation Improves Structure from Motion [46.77236343300953]
We propose a novel approach to video-based motion segmentation to identify the components of a scene that are moving w.r.t. a fixed world frame.<n>Our simple but effective iterative method, RoMo, combines optical flow and epipolar cues with a pre-trained video segmentation model.<n>More importantly, the combination of an off-the-shelf SfM pipeline with our segmentation masks establishes a new state-of-the-art on camera calibration for scenes with dynamic content, outperforming existing methods by a substantial margin.
arXiv Detail & Related papers (2024-11-27T01:09:56Z)
Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view. Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z)
Robustifying the Multi-Scale Representation of Neural Radiance Fields [86.69338893753886]
We present a robust multi-scale neural radiance fields representation approach to overcome both real-world imaging issues. Our method handles multi-scale imaging effects and camera-pose estimation problems with NeRF-inspired approaches. We demonstrate, with examples, that for an accurate neural representation of an object from day-to-day acquired multi-view images, it is crucial to have precise camera-pose estimates.
arXiv Detail & Related papers (2022-10-09T11:46:45Z)
Know Your Surroundings: Panoramic Multi-Object Tracking by Multimodality Collaboration [56.01625477187448]
We propose a MultiModality PAnoramic multi-object Tracking framework (MMPAT) It takes both 2D panorama images and 3D point clouds as input and then infers target trajectories using the multimodality data. We evaluate the proposed method on the JRDB dataset, where the MMPAT achieves the top performance in both the detection and tracking tasks.
arXiv Detail & Related papers (2021-05-31T03:16:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.