Stereo Risk: A Continuous Modeling Approach to Stereo Matching
- URL: http://arxiv.org/abs/2407.03152v1
- Date: Wed, 3 Jul 2024 14:30:47 GMT
- Title: Stereo Risk: A Continuous Modeling Approach to Stereo Matching
- Authors: Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Yao Yao, Luc Van Gool,
- Abstract summary: We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision.
We demonstrate that Stereo Risk enhances stereo-matching performance for deep networks, particularly for disparities with multi-modal probability distributions.
A comprehensive analysis demonstrates our method's theoretical soundness and superior performance over the state-of-the-art methods across various benchmark datasets.
- Score: 110.22344879336043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision. As it is well-known that stereo matching boils down to a per-pixel disparity estimation problem, the popular state-of-the-art stereo-matching approaches widely rely on regressing the scene disparity values, yet via discretization of scene disparity values. Such discretization often fails to capture the nuanced, continuous nature of scene depth. Stereo Risk departs from the conventional discretization approach by formulating the scene disparity as an optimal solution to a continuous risk minimization problem, hence the name "stereo risk". We demonstrate that $L^1$ minimization of the proposed continuous risk function enhances stereo-matching performance for deep networks, particularly for disparities with multi-modal probability distributions. Furthermore, to enable the end-to-end network training of the non-differentiable $L^1$ risk optimization, we exploited the implicit function theorem, ensuring a fully differentiable network. A comprehensive analysis demonstrates our method's theoretical soundness and superior performance over the state-of-the-art methods across various benchmark datasets, including KITTI 2012, KITTI 2015, ETH3D, SceneFlow, and Middlebury 2014.
Related papers
- UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching [18.02254687807291]
UniTT-Stereo is a method to maximize the potential of Transformer-based stereo architectures.
State-of-the-art performance of UniTT-Stereo is validated on various benchmarks such as ETH3D, KITTI 2012, and KITTI 2015 datasets.
arXiv Detail & Related papers (2024-09-04T09:02:01Z) - ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion [17.448021191744285]
Multi-frame monocular depth estimation relies on the geometric consistency between successive frames under the assumption of a static scene.
The presence of moving objects in dynamic scenes introduces inevitable inconsistencies, causing misaligned multi-frame feature matching and misleading self-supervision during training.
We propose a novel framework called ProDepth, which effectively addresses the mismatch problem caused by dynamic objects using a probabilistic approach.
arXiv Detail & Related papers (2024-07-12T14:37:49Z) - Digging into contrastive learning for robust depth estimation with diffusion models [55.62276027922499]
We propose a novel robust depth estimation method called D4RD.
It features a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments.
In experiments, D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions.
arXiv Detail & Related papers (2024-04-15T14:29:47Z) - Modeling Stereo-Confidence Out of the End-to-End Stereo-Matching Network
via Disparity Plane Sweep [31.261772846687297]
The proposed stereo-confidence method is built upon the idea that any shift in a stereo-image pair should be updated in a corresponding amount shift in the disparity map.
By comparing the desirable and predicted disparity profiles, we can quantify the level of matching ambiguity between left and right images for confidence measurement.
arXiv Detail & Related papers (2024-01-22T14:52:08Z) - Left-right Discrepancy for Adversarial Attack on Stereo Networks [8.420135490466851]
We introduce a novel adversarial attack approach that generates perturbation noise specifically designed to maximize the discrepancy between left and right image features.
Experiments demonstrate the superior capability of our method to induce larger prediction errors in stereo neural networks.
arXiv Detail & Related papers (2024-01-14T02:30:38Z) - AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks.
Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo.
Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - Reversing the cycle: self-supervised deep stereo through enhanced
monocular distillation [51.714092199995044]
In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches.
We propose a novel self-supervised paradigm reversing the link between the two.
In order to train deep stereo networks, we distill knowledge through a monocular completion network.
arXiv Detail & Related papers (2020-08-17T07:40:22Z) - Expanding Sparse Guidance for Stereo Matching [24.74333370941674]
We propose a novel sparsity expansion technique to expand the sparse cues concerning RGB images for local feature enhancement.
Our approach significantly boosts the existing state-of-the-art stereo algorithms with extremely sparse cues.
arXiv Detail & Related papers (2020-04-24T06:41:11Z) - AdaStereo: A Simple and Efficient Approach for Adaptive Stereo Matching [50.06646151004375]
A novel domain-adaptive pipeline called AdaStereo aims to align multi-level representations for deep stereo matching networks.
Our AdaStereo models achieve state-of-the-art cross-domain performance on multiple stereo benchmarks, including KITTI, Middlebury, ETH3D, and DrivingStereo.
arXiv Detail & Related papers (2020-04-09T16:15:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.