Improving Monocular Visual-Inertial Initialization with Structureless Visual-Inertial Bundle Adjustment
- URL: http://arxiv.org/abs/2502.16598v1
- Date: Sun, 23 Feb 2025 14:55:40 GMT
- Title: Improving Monocular Visual-Inertial Initialization with Structureless Visual-Inertial Bundle Adjustment
- Authors: Junlin Song, Antoine Richard, Miguel Olivares-Mendez,
- Abstract summary: We propose a structureless visual-inertial bundle adjustment to further refine previous structureless solution.<n>Experiments on real-world datasets show our method significantly improves the VIO initialization accuracy, while maintaining real-time performance.
- Score: 0.36868085124383626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular visual inertial odometry (VIO) has facilitated a wide range of real-time motion tracking applications, thanks to the small size of the sensor suite and low power consumption. To successfully bootstrap VIO algorithms, the initialization module is extremely important. Most initialization methods rely on the reconstruction of 3D visual point clouds. These methods suffer from high computational cost as state vector contains both motion states and 3D feature points. To address this issue, some researchers recently proposed a structureless initialization method, which can solve the initial state without recovering 3D structure. However, this method potentially compromises performance due to the decoupled estimation of rotation and translation, as well as linear constraints. To improve its accuracy, we propose novel structureless visual-inertial bundle adjustment to further refine previous structureless solution. Extensive experiments on real-world datasets show our method significantly improves the VIO initialization accuracy, while maintaining real-time performance.
Related papers
- HORT: Monocular Hand-held Objects Reconstruction with Transformers [61.36376511119355]
Reconstructing hand-held objects in 3D from monocular images is a significant challenge in computer vision.
We propose a transformer-based model to efficiently reconstruct dense 3D point clouds of hand-held objects.
Our method achieves state-of-the-art accuracy with much faster inference speed, while generalizing well to in-the-wild images.
arXiv Detail & Related papers (2025-03-27T09:45:09Z) - MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.
By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.
We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - Learned Monocular Depth Priors in Visual-Inertial Initialization [4.99761983273316]
Visual-inertial odometry (VIO) is the pose estimation backbone for most AR/VR and autonomous robotic systems today.
We propose to circumvent the limitations of classical visual-inertial structure-from-motion (SfM)
We leverage learned monocular depth images (mono-depth) to constrain the relative depth of features, and upgrade the mono-depth to metric scale by jointly optimizing for its scale and shift.
arXiv Detail & Related papers (2022-04-20T00:30:04Z) - OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic
3D Reconstruction [14.130915525776055]
RGBD-based real-time dynamic 3D reconstruction suffers from inaccurate inter-frame motion estimation.
We propose OcclusionFusion, a novel method to calculate occlusion-aware 3D motion to guide the reconstruction.
Our technique outperforms existing single-view-based real-time methods by a large margin.
arXiv Detail & Related papers (2022-03-15T15:09:01Z) - Revisiting Point Cloud Simplification: A Learnable Feature Preserving
Approach [57.67932970472768]
Mesh and Point Cloud simplification methods aim to reduce the complexity of 3D models while retaining visual quality and relevant salient features.
We propose a fast point cloud simplification method by learning to sample salient points.
The proposed method relies on a graph neural network architecture trained to select an arbitrary, user-defined, number of points from the input space and to re-arrange their positions so as to minimize the visual perception error.
arXiv Detail & Related papers (2021-09-30T10:23:55Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - FlowStep3D: Model Unrolling for Self-Supervised Scene Flow Estimation [87.74617110803189]
Estimating the 3D motion of points in a scene, known as scene flow, is a core problem in computer vision.
We present a recurrent architecture that learns a single step of an unrolled iterative alignment procedure for refining scene flow predictions.
arXiv Detail & Related papers (2020-11-19T23:23:48Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - GPO: Global Plane Optimization for Fast and Accurate Monocular SLAM
Initialization [22.847353792031488]
The algorithm starts by homography estimation in a sliding window.
The proposed method fully exploits the plane information from multiple frames and avoids the ambiguities in homography decomposition.
Experimental results show that our method outperforms the fine-tuned baselines in both accuracy and real-time.
arXiv Detail & Related papers (2020-04-25T03:57:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.