Related papers: CoProU-VO: Combining Projected Uncertainty for End-to-End Unsupervised Monocular Visual Odometry

CoProU-VO: Combining Projected Uncertainty for End-to-End Unsupervised Monocular Visual Odometry

URL: http://arxiv.org/abs/2508.00568v1
Date: Fri, 01 Aug 2025 12:09:42 GMT
Title: CoProU-VO: Combining Projected Uncertainty for End-to-End Unsupervised Monocular Visual Odometry
Authors: Jingchao Xie, Oussema Dhaouadi, Weirong Chen, Johannes Meier, Jacques Kaiser, Daniel Cremers,
Abstract summary: Visual Odometry (VO) is fundamental to autonomous navigation, robotics, and augmented reality.<n>We introduce Combined Projected Uncertainty VO (CoProU-VO), a novel end-to-end approach that combines target frame uncertainty with projected reference frame uncertainty.<n>Experiments on the KITTI and nuScenes datasets demonstrate significant improvements over previous unsupervised monocular end-to-end two-frame-based methods.
Score: 33.293024344960706
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual Odometry (VO) is fundamental to autonomous navigation, robotics, and augmented reality, with unsupervised approaches eliminating the need for expensive ground-truth labels. However, these methods struggle when dynamic objects violate the static scene assumption, leading to erroneous pose estimations. We tackle this problem by uncertainty modeling, which is a commonly used technique that creates robust masks to filter out dynamic objects and occlusions without requiring explicit motion segmentation. Traditional uncertainty modeling considers only single-frame information, overlooking the uncertainties across consecutive frames. Our key insight is that uncertainty must be propagated and combined across temporal frames to effectively identify unreliable regions, particularly in dynamic scenes. To address this challenge, we introduce Combined Projected Uncertainty VO (CoProU-VO), a novel end-to-end approach that combines target frame uncertainty with projected reference frame uncertainty using a principled probabilistic formulation. Built upon vision transformer backbones, our model simultaneously learns depth, uncertainty estimation, and camera poses. Consequently, experiments on the KITTI and nuScenes datasets demonstrate significant improvements over previous unsupervised monocular end-to-end two-frame-based methods and exhibit strong performance in challenging highway scenes where other approaches often fail. Additionally, comprehensive ablation studies validate the effectiveness of cross-frame uncertainty propagation.

Related papers

EyeSeg: An Uncertainty-Aware Eye Segmentation Framework for AR/VR [58.33693755009173]
EyeSeg is an uncertainty-aware eye segmentation framework for augmented reality (AR) and virtual reality (VR)<n>We show that EyeSeg achieves segmentation improvements of MIoU, E1, F1, and ACC surpassing previous approaches.
arXiv Detail & Related papers (2025-07-13T14:33:10Z)
UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification [26.770271366177603]
We propose a robust approach named Uncertainty-Guided Graph model for multi-modal object ReID (UGG-ReID)<n>UGG-ReID is designed to mitigate noise interference and facilitate effective multi-modal fusion.<n> Experimental results show that the proposed method achieves excellent performance on all datasets.
arXiv Detail & Related papers (2025-07-07T03:41:08Z)
Unsupervised Imaging Inverse Problems with Diffusion Distribution Matching [35.01013208265617]
This work addresses image restoration tasks through the lens of inverse problems using unpaired datasets.<n>The proposed method operates under minimal assumptions and relies only on small, unpaired datasets.<n>It is particularly well-suited for real-world scenarios, where the forward model is often unknown or misspecified.
arXiv Detail & Related papers (2025-06-17T15:06:43Z)
Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions [49.546479320670464]
This paper introduces specialized metrics for benchmarking the spatial robustness of segmentation models.<n>We propose region-aware multi-attack adversarial analysis, a method that enables a deeper understanding of model robustness.<n>The results reveal that models respond to these two types of threats differently.
arXiv Detail & Related papers (2025-04-02T11:37:39Z)
Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework [54.40508478482667]
We present a comprehensive framework to disentangle, quantify, and mitigate uncertainty in perception and plan generation.<n>We propose methods tailored to the unique properties of perception and decision-making.<n>We show that our uncertainty disentanglement framework reduces variability by up to 40% and enhances task success rates by 5% compared to baselines.
arXiv Detail & Related papers (2024-11-03T17:32:00Z)
Stochasticity in Motion: An Information-Theoretic Approach to Trajectory Prediction [9.365269316773219]
This paper addresses the challenge of uncertainty modeling in trajectory prediction with a holistic approach.<n>Our method, grounded in information theory, provides a theoretically principled way to measure uncertainty.<n>Unlike prior work, our approach is compatible with state-of-the-art motion predictors, allowing for broader applicability.
arXiv Detail & Related papers (2024-10-02T15:02:32Z)
UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection [18.25576487115016]
This paper focuses on Human-Object Interaction (HOI) detection. It addresses the challenge of identifying and understanding the interactions between humans and objects within a given image or video frame. We propose a novel approach textscUAHOI, Uncertainty-aware Robust Human-Object Interaction Learning.
arXiv Detail & Related papers (2024-08-14T10:06:39Z)
ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion [17.448021191744285]
Multi-frame monocular depth estimation relies on the geometric consistency between successive frames under the assumption of a static scene. The presence of moving objects in dynamic scenes introduces inevitable inconsistencies, causing misaligned multi-frame feature matching and misleading self-supervision during training. We propose a novel framework called ProDepth, which effectively addresses the mismatch problem caused by dynamic objects using a probabilistic approach.
arXiv Detail & Related papers (2024-07-12T14:37:49Z)
Lightweight, Uncertainty-Aware Conformalized Visual Odometry [2.429910016019183]
Data-driven visual odometry (VO) is a critical subroutine for autonomous edge robotics. Emerging edge robotics devices like insect-scale drones and surgical robots lack a computationally efficient framework to estimate VO's predictive uncertainties. This paper presents a novel, lightweight, and statistically robust framework that leverages conformal inference (CI) to extract VO's uncertainty bands.
arXiv Detail & Related papers (2023-03-03T20:37:55Z)
Robust Single Image Dehazing Based on Consistent and Contrast-Assisted Reconstruction [95.5735805072852]
We propose a novel density-variational learning framework to improve the robustness of the image dehzing model. Specifically, the dehazing network is optimized under the consistency-regularized framework. Our method significantly surpasses the state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T08:11:04Z)
Robust and Precise Facial Landmark Detection by Self-Calibrated Pose Attention Network [73.56802915291917]
We propose a semi-supervised framework to achieve more robust and precise facial landmark detection. A Boundary-Aware Landmark Intensity (BALI) field is proposed to model more effective facial shape constraints. A Self-Calibrated Pose Attention (SCPA) model is designed to provide a self-learned objective function that enforces intermediate supervision.
arXiv Detail & Related papers (2021-12-23T02:51:08Z)
Consistency Guided Scene Flow Estimation [159.24395181068218]
CGSF is a self-supervised framework for the joint reconstruction of 3D scene structure and motion from stereo video. We show that the proposed model can reliably predict disparity and scene flow in challenging imagery. It achieves better generalization than the state-of-the-art, and adapts quickly and robustly to unseen domains.
arXiv Detail & Related papers (2020-06-19T17:28:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.