Geometry-Constrained Monocular Scale Estimation Using Semantic Segmentation for Dynamic Scenes
- URL: http://arxiv.org/abs/2503.04235v1
- Date: Thu, 06 Mar 2025 09:15:13 GMT
- Title: Geometry-Constrained Monocular Scale Estimation Using Semantic Segmentation for Dynamic Scenes
- Authors: Hui Zhang, Zhiyang Wu, Qianqian Shangguan, Kang An,
- Abstract summary: This study presents innovative strategies for ego-motion estimation and the selection of ground points.<n>Our methodology incorporates dy-namic object masks to eliminate unstable features and employs ground plane masks for meticulous triangulation.<n>The integration of this approach with the mo-nocular version of ORB-SLAM3 culminates in the accurate esti-mation of a road model.
- Score: 3.635236692041662
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular visual localization plays a pivotal role in advanced driver assistance systems and autonomous driving by estimating a vehicle's ego-motion from a single pinhole camera. Nevertheless, conventional monocular visual odometry encoun-ters challenges in scale estimation due to the absence of depth information during projection. Previous methodologies, whether rooted in physical constraints or deep learning paradigms, con-tend with issues related to computational complexity and the management of dynamic objects. This study extends our prior research, presenting innovative strategies for ego-motion estima-tion and the selection of ground points. Striving for a nuanced equilibrium between computational efficiency and precision, we propose a hybrid method that leverages the SegNeXt model for real-time applications, encompassing both ego-motion estimation and ground point selection. Our methodology incorporates dy-namic object masks to eliminate unstable features and employs ground plane masks for meticulous triangulation. Furthermore, we exploit Geometry-constraint to delineate road regions for scale recovery. The integration of this approach with the mo-nocular version of ORB-SLAM3 culminates in the accurate esti-mation of a road model, a pivotal component in our scale recov-ery process. Rigorous experiments, conducted on the KITTI da-taset, systematically compare our method with existing monocu-lar visual odometry algorithms and contemporary scale recovery methodologies. The results undeniably confirm the superior ef-fectiveness of our approach, surpassing state-of-the-art visual odometry algorithms. Our source code is available at https://git hub.com/bFr0zNq/MVOSegScale.
Related papers
- TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs [5.6168844664788855]
This work presents TanDepth, a practical scale recovery method for obtaining metric depth results from relative estimations at inference-time.<n>Our method leverages sparse measurements from Global Digital Elevation Models (GDEM) by projecting them to the camera view.<n>An adaptation to the Cloth Filter Simulation is presented, which allows selecting ground points from the estimated depth map to then correlate with the projected reference points.
arXiv Detail & Related papers (2024-09-08T15:54:43Z) - Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry [9.79428015716139]
In this paper, we analyze major failure cases on outdoor benchmarks and expose shortcomings of a learning-based SLAM model (DROID-SLAM)
We propose the use of self-supervised priors leveraging a frozen large-scale pre-trained monocular depth estimation to initialize the dense bundle adjustment process.
Despite its simplicity, our proposed method demonstrates significant improvements on KITTI odometry, as well as the challenging DDAD benchmark.
arXiv Detail & Related papers (2024-06-03T01:59:29Z) - A Survey on Deep Learning-Based Monocular Spacecraft Pose Estimation:
Current State, Limitations and Prospects [7.08026800833095]
Estimating the pose of an uncooperative spacecraft is an important computer vision problem for enabling vision-based systems in orbit.
Following the general trend in computer vision, more and more works have been focusing on leveraging Deep Learning (DL) methods to address this problem.
Despite promising research-stage results, major challenges preventing the use of such methods in real-life missions still stand in the way.
arXiv Detail & Related papers (2023-05-12T09:52:53Z) - Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach [0.0]
Estimating the camera's pose given images from a single camera is a traditional task in mobile robots.<n>Deep learning methods have been shown to be general after proper training and with a large amount of available data.<n>We present the TSformer-VO model based ontemporal selfattention mechanisms to extract features from clips and estimate the motions in an end-to-end manner.
arXiv Detail & Related papers (2023-05-10T13:11:23Z) - A kinetic approach to consensus-based segmentation of biomedical images [39.58317527488534]
We apply a kinetic version of a bounded confidence consensus model to biomedical segmentation problems.
The large time behavior of the system is then computed with the aid of a surrogate Fokker-Planck approach.
We minimize the introduced segmentation metric for a relevant set of 2D gray-scale images.
arXiv Detail & Related papers (2022-11-08T09:54:34Z) - RISP: Rendering-Invariant State Predictor with Differentiable Simulation
and Rendering for Cross-Domain Parameter Estimation [110.4255414234771]
Existing solutions require massive training data or lack generalizability to unknown rendering configurations.
We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem.
Our approach achieves significantly lower reconstruction errors and has better generalizability among unknown rendering configurations.
arXiv Detail & Related papers (2022-05-11T17:59:51Z) - Object-centric and memory-guided normality reconstruction for video
anomaly detection [56.64792194894702]
This paper addresses anomaly detection problem for videosurveillance.
Due to the inherent rarity and heterogeneity of abnormal events, the problem is viewed as a normality modeling strategy.
Our model learns object-centric normal patterns without seeing anomalous samples during training.
arXiv Detail & Related papers (2022-03-07T19:28:39Z) - Robust Visual Odometry Using Position-Aware Flow and Geometric Bundle
Adjustment [16.04240592057438]
A novel optical flow network (PANet) built on a position-aware mechanism is proposed first.
Then, a novel system that jointly estimates depth, optical flow, and ego-motion without a typical network to learning ego-motion is proposed.
Experiments show that the proposed system not only outperforms other state-of-the-art methods in terms of depth, flow, and VO estimation.
arXiv Detail & Related papers (2021-11-22T12:05:27Z) - MotionHint: Self-Supervised Monocular Visual Odometry with Motion
Constraints [70.76761166614511]
We present a novel self-supervised algorithm named MotionHint for monocular visual odometry (VO)
Our MotionHint algorithm can be easily applied to existing open-sourced state-of-the-art SSM-VO systems.
arXiv Detail & Related papers (2021-09-14T15:35:08Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - Nothing But Geometric Constraints: A Model-Free Method for Articulated
Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori.
We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.