Related papers: 2T-UNET: A Two-Tower UNet with Depth Clues for Robust Stereo Depth Estimation

2T-UNET: A Two-Tower UNet with Depth Clues for Robust Stereo Depth Estimation

URL: http://arxiv.org/abs/2210.15374v1
Date: Thu, 27 Oct 2022 12:34:41 GMT
Title: 2T-UNET: A Two-Tower UNet with Depth Clues for Robust Stereo Depth Estimation
Authors: Rohit Choudhary and Mansi Sharma and Rithvik Anil
Abstract summary: This paper revisits the depth estimation problem, avoiding the explicit stereo matching step using a simple two-tower convolutional neural network. The proposed algorithm is entitled as 2T-UNet. The architecture performs incredibly well on complex natural scenes, highlighting its usefulness for various real-time applications.
Score: 0.2578242050187029
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Stereo correspondence matching is an essential part of the multi-step stereo depth estimation process. This paper revisits the depth estimation problem, avoiding the explicit stereo matching step using a simple two-tower convolutional neural network. The proposed algorithm is entitled as 2T-UNet. The idea behind 2T-UNet is to replace cost volume construction with twin convolution towers. These towers have an allowance for different weights between them. Additionally, the input for twin encoders in 2T-UNet are different compared to the existing stereo methods. Generally, a stereo network takes a right and left image pair as input to determine the scene geometry. However, in the 2T-UNet model, the right stereo image is taken as one input and the left stereo image along with its monocular depth clue information, is taken as the other input. Depth clues provide complementary suggestions that help enhance the quality of predicted scene geometry. The 2T-UNet surpasses state-of-the-art monocular and stereo depth estimation methods on the challenging Scene flow dataset, both quantitatively and qualitatively. The architecture performs incredibly well on complex natural scenes, highlighting its usefulness for various real-time applications. Pretrained weights and code will be made readily available.

Related papers

Integrating Disparity Confidence Estimation into Relative Depth Prior-Guided Unsupervised Stereo Matching [55.784713740698365]
Unsupervised stereo matching has garnered significant attention for its independence from costly disparity annotations.<n>A feasible solution lies in transferring 3D geometric knowledge from a relative depth map to the stereo matching networks.<n>This work proposes a novel unsupervised learning framework to address these challenges.
arXiv Detail & Related papers (2025-08-02T09:11:05Z)
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model [62.37493746544967]
Camera-based setups offer a cost-effective option by using stereo depth estimation to generate dense, high-resolution depth maps. Existing omnidirectional stereo matching approaches achieve only limited depth accuracy across diverse environments. We present DFI-OmniStereo, a novel omnidirectional stereo matching method that leverages a large-scale pre-trained foundation model for relative monocular depth estimation.
arXiv Detail & Related papers (2025-03-30T16:24:22Z)
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion [88.67015254278859]
We introduce the Mono2Stereo dataset, providing high-quality training data and benchmark to support in-depth exploration of stereo conversion. We conduct an empirical study that yields two primary findings. 1) The differences between the left and right views are subtle, yet existing metrics consider overall pixels, failing to concentrate on regions critical to stereo effects. We introduce a new evaluation metric, Stereo Intersection-over-Union, which harmonizes disparity and achieves a high correlation with human judgments on stereo effect.
arXiv Detail & Related papers (2025-03-28T09:25:58Z)
Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection. Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon. Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z)
MEStereo-Du2CNN: A Novel Dual Channel CNN for Learning Robust Depth Estimates from Multi-exposure Stereo Images for HDR 3D Applications [0.22940141855172028]
We develop a novel deep architecture for multi-exposure stereo depth estimation. For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed. In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods.
arXiv Detail & Related papers (2022-06-21T13:23:22Z)
DSGN++: Exploiting Visual-Spatial Relation forStereo-based 3D Detectors [60.88824519770208]
Camera-based 3D object detectors are welcome due to their wider deployment and lower price than LiDAR sensors. We revisit the prior stereo modeling DSGN about the stereo volume constructions for representing both 3D geometry and semantics. We propose our approach, DSGN++, aiming for improving information flow throughout the 2D-to-3D pipeline.
arXiv Detail & Related papers (2022-04-06T18:43:54Z)
Depth Refinement for Improved Stereo Reconstruction [13.941756438712382]
Current techniques for depth estimation from stereoscopic images still suffer from a built-in drawback. A simple analysis reveals that the depth error is quadratically proportional to the object's distance. We propose a simple but effective method that uses a refinement network for depth estimation.
arXiv Detail & Related papers (2021-12-15T12:21:08Z)
IB-MVS: An Iterative Algorithm for Deep Multi-View Stereo based on Binary Decisions [0.0]
We present a novel deep-learning-based method for Multi-View Stereo. Our method estimates high resolution and highly precise depth maps iteratively, by traversing the continuous space of feasible depth values at each pixel in a binary decision fashion. We compare our method with state-of-the-art Multi-View Stereo methods on the DTU, Tanks and Temples and the challenging ETH3D benchmarks and show competitive results.
arXiv Detail & Related papers (2021-11-29T10:04:24Z)
VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results. Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume. In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z)
H-Net: Unsupervised Attention-based Stereo Depth Estimation Leveraging Epipolar Geometry [4.968452390132676]
We introduce the H-Net, a deep-learning framework for unsupervised stereo depth estimation. For the first time, a Siamese autoencoder architecture is used for depth estimation. Our method outperforms the state-ofthe-art unsupervised stereo depth estimation methods.
arXiv Detail & Related papers (2021-04-22T19:16:35Z)
Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM. We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline. Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z)
Direct Depth Learning Network for Stereo Matching [79.3665881702387]
A novel Direct Depth Learning Network (DDL-Net) is designed for stereo matching. DDL-Net consists of two stages: the Coarse Depth Estimation stage and the Adaptive-Grained Depth Refinement stage. We show that DDL-Net achieves an average improvement of 25% on the SceneFlow dataset and $12%$ on the DrivingStereo dataset.
arXiv Detail & Related papers (2020-12-10T10:33:57Z)
Bi3D: Stereo Depth Estimation via Binary Classifications [129.1582262491508]
We present Bi3D, a method that estimates depth via a series of binary classifications. Given a strict time budget, Bi3D can detect objects closer than a given distance in as little as a few milliseconds. For standard stereo (i.e., continuous depth on the whole range), our method is close to or on par with state-of-the-art, finely tuned stereo methods.
arXiv Detail & Related papers (2020-05-14T21:33:00Z)
Du$^2$Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels [16.797169907541164]
We present a novel approach based on neural networks for depth estimation that combines stereo from dual cameras with stereo from a dual-pixel sensor. Our network uses a novel architecture to fuse these two sources of information and can overcome the limitations of pure binocular stereo matching.
arXiv Detail & Related papers (2020-03-31T15:39:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.