ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion
- URL: http://arxiv.org/abs/2407.09303v1
- Date: Fri, 12 Jul 2024 14:37:49 GMT
- Title: ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion
- Authors: Sungmin Woo, Wonjoon Lee, Woo Jin Kim, Dogyoon Lee, Sangyoun Lee,
- Abstract summary: Multi-frame monocular depth estimation relies on the geometric consistency between successive frames under the assumption of a static scene.
The presence of moving objects in dynamic scenes introduces inevitable inconsistencies, causing misaligned multi-frame feature matching and misleading self-supervision during training.
We propose a novel framework called ProDepth, which effectively addresses the mismatch problem caused by dynamic objects using a probabilistic approach.
- Score: 17.448021191744285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised multi-frame monocular depth estimation relies on the geometric consistency between successive frames under the assumption of a static scene. However, the presence of moving objects in dynamic scenes introduces inevitable inconsistencies, causing misaligned multi-frame feature matching and misleading self-supervision during training. In this paper, we propose a novel framework called ProDepth, which effectively addresses the mismatch problem caused by dynamic objects using a probabilistic approach. We initially deduce the uncertainty associated with static scene assumption by adopting an auxiliary decoder. This decoder analyzes inconsistencies embedded in the cost volume, inferring the probability of areas being dynamic. We then directly rectify the erroneous cost volume for dynamic areas through a Probabilistic Cost Volume Modulation (PCVM) module. Specifically, we derive probability distributions of depth candidates from both single-frame and multi-frame cues, modulating the cost volume by adaptively fusing those distributions based on the inferred uncertainty. Additionally, we present a self-supervision loss reweighting strategy that not only masks out incorrect supervision with high uncertainty but also mitigates the risks in remaining possible dynamic areas in accordance with the probability. Our proposed method excels over state-of-the-art approaches in all metrics on both Cityscapes and KITTI datasets, and demonstrates superior generalization ability on the Waymo Open dataset.
Related papers
- D$^3$epth: Self-Supervised Depth Estimation with Dynamic Mask in Dynamic Scenes [23.731667977542454]
D$3$epth is a novel method for self-supervised depth estimation in dynamic scenes.
It tackles the challenge of dynamic objects from two key perspectives.
It consistently outperforms existing self-supervised monocular depth estimation baselines.
arXiv Detail & Related papers (2024-11-07T16:07:00Z) - Stereo Risk: A Continuous Modeling Approach to Stereo Matching [110.22344879336043]
We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision.
We demonstrate that Stereo Risk enhances stereo-matching performance for deep networks, particularly for disparities with multi-modal probability distributions.
A comprehensive analysis demonstrates our method's theoretical soundness and superior performance over the state-of-the-art methods across various benchmark datasets.
arXiv Detail & Related papers (2024-07-03T14:30:47Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - A Robustness Analysis of Blind Source Separation [91.3755431537592]
Blind source separation (BSS) aims to recover an unobserved signal from its mixture $X=f(S)$ under the condition that the transformation $f$ is invertible but unknown.
We present a general framework for analysing such violations and quantifying their impact on the blind recovery of $S$ from $X$.
We show that a generic BSS-solution in response to general deviations from its defining structural assumptions can be profitably analysed in the form of explicit continuity guarantees.
arXiv Detail & Related papers (2023-03-17T16:30:51Z) - Robust Control for Dynamical Systems With Non-Gaussian Noise via Formal
Abstractions [59.605246463200736]
We present a novel controller synthesis method that does not rely on any explicit representation of the noise distributions.
First, we abstract the continuous control system into a finite-state model that captures noise by probabilistic transitions between discrete states.
We use state-of-the-art verification techniques to provide guarantees on the interval Markov decision process and compute a controller for which these guarantees carry over to the original control system.
arXiv Detail & Related papers (2023-01-04T10:40:30Z) - Modeling Multimodal Aleatoric Uncertainty in Segmentation with Mixture
of Stochastic Expert [24.216869988183092]
We focus on capturing the data-inherent uncertainty (aka aleatoric uncertainty) in segmentation, typically when ambiguities exist in input images.
We propose a novel mixture of experts (MoSE) model, where each expert network estimates a distinct mode of aleatoric uncertainty.
We develop a Wasserstein-like loss that directly minimizes the distribution distance between the MoSE and ground truth annotations.
arXiv Detail & Related papers (2022-12-14T16:48:21Z) - Towards Robust Unsupervised Disentanglement of Sequential Data -- A Case
Study Using Music Audio [17.214062755082065]
Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models.
We show that the vanilla DSAE suffers from being sensitive to the choice of model architecture and capacity of the dynamic latent variables.
We propose TS-DSAE, a two-stage training framework that first learns sequence-level prior distributions.
arXiv Detail & Related papers (2022-05-12T04:11:25Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - Modal Uncertainty Estimation via Discrete Latent Representation [4.246061945756033]
We introduce a deep learning framework that learns the one-to-many mappings between the inputs and outputs, together with faithful uncertainty measures.
Our framework demonstrates significantly more accurate uncertainty estimation than the current state-of-the-art methods.
arXiv Detail & Related papers (2020-07-25T05:29:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.