Related papers: Iterative Geometry Encoding Volume for Stereo Matching

Iterative Geometry Encoding Volume for Stereo Matching

URL: http://arxiv.org/abs/2303.06615v2
Date: Tue, 14 Mar 2023 08:39:23 GMT
Title: Iterative Geometry Encoding Volume for Stereo Matching
Authors: Gangwei Xu, Xianqi Wang, Xiaohuan Ding, Xin Yang
Abstract summary: IGEV-Stereo builds a combined geometry encoding volume that encodes geometry and context information as well as local matching details. Our IGEV-Stereo ranks $1st$ on KITTI 2015 and 2012 (Reflective) among all published methods and is the fastest among the top 10 methods. We also extend our IGEV to multi-view stereo (MVS) to achieve competitive accuracy on DTU benchmark.
Score: 4.610675756857714
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recurrent All-Pairs Field Transforms (RAFT) has shown great potentials in matching tasks. However, all-pairs correlations lack non-local geometry knowledge and have difficulties tackling local ambiguities in ill-posed regions. In this paper, we propose Iterative Geometry Encoding Volume (IGEV-Stereo), a new deep network architecture for stereo matching. The proposed IGEV-Stereo builds a combined geometry encoding volume that encodes geometry and context information as well as local matching details, and iteratively indexes it to update the disparity map. To speed up the convergence, we exploit GEV to regress an accurate starting point for ConvGRUs iterations. Our IGEV-Stereo ranks $1^{st}$ on KITTI 2015 and 2012 (Reflective) among all published methods and is the fastest among the top 10 methods. In addition, IGEV-Stereo has strong cross-dataset generalization as well as high inference efficiency. We also extend our IGEV to multi-view stereo (MVS), i.e. IGEV-MVS, which achieves competitive accuracy on DTU benchmark. Code is available at https://github.com/gangweiX/IGEV.

Related papers

FoundationStereo: Zero-Shot Stereo Matching [50.79202911274819]
FoundationStereo is a foundation model for stereo depth estimation. We first construct a large-scale (1M stereo pairs) synthetic training dataset. We then design a number of network architecture components to enhance scalability.
arXiv Detail & Related papers (2025-01-17T01:01:44Z)
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes [53.107474952492396]
CityGaussianV2 is a novel approach for large-scale scene reconstruction. We implement a decomposed-gradient-based densification and depth regression technique to eliminate blurry artifacts and accelerate convergence. Our method strikes a promising balance between visual quality, geometric accuracy, as well as storage and training costs.
arXiv Detail & Related papers (2024-11-01T17:59:31Z)
IGEV++: Iterative Multi-range Geometry Encoding Volumes for Stereo Matching [7.859381791267791]
We propose a new deep network architecture, called IGEV++, for stereo matching. The proposed IGEV++ builds Multi-range Geometry Volumes (MGEV) that encode coarse-grained geometry information for ill-posed regions. We introduce an adaptive patch matching module that efficiently computes matching costs for large disparity ranges and/or ill-posed regions.
arXiv Detail & Related papers (2024-09-01T07:02:36Z)
GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers [53.80009458891537]
Cross-view video geo-localization aims to derive GPS trajectories from street-view videos by aligning them with aerial-view images. Current CVGL methods use camera and odometry data, typically absent in real-world scenarios. We propose GAReT, a fully transformer-based method for CVGL that does not require camera and odometry data.
arXiv Detail & Related papers (2024-08-05T21:29:33Z)
Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement [8.592248643229675]
Occupancy prediction plays a pivotal role in autonomous driving (AD) Existing methods often incur high computational costs, which contradicts the real-time demands of AD. We propose a Geometric-Semantic Dual-Branch Network (GSDBN) with a hybrid BEV-Voxel representation.
arXiv Detail & Related papers (2024-07-18T04:46:13Z)
U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization [81.76044207714637]
Relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails. Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance. This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features.
arXiv Detail & Related papers (2023-10-20T18:57:38Z)
CGI-Stereo: Accurate and Real-Time Stereo Matching via Context and Geometry Interaction [8.484952030063114]
CGI-Stereo is a novel neural network architecture that can concurrently achieve real-time performance, state-of-the-art accuracy, and strong generalization ability. The core of CGI-Stereo is a Context and Geometry Fusion block which adaptively fuses context and geometry information. The proposed CGF can be easily embedded into many existing stereo matching networks.
arXiv Detail & Related papers (2023-01-07T06:28:04Z)
RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo [20.470182157606818]
"Learning-to-optimize" paradigm iteratively indexes a plane-sweeping cost volume and regresses the depth map via a convolutional Gated Recurrent Unit (GRU) We conduct extensive experiments on real-world MVS datasets and show that our method achieves state-of-the-art performance in terms of both within-dataset evaluation and cross-dataset generalization.
arXiv Detail & Related papers (2022-05-28T03:32:56Z)
IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo [71.84742490020611]
IterMVS is a new data-driven method for high-resolution multi-view stereo. We propose a novel GRU-based estimator that encodes pixel-wise probability distributions of depth in its hidden state. We verify the efficiency and effectiveness of our method on DTU, Tanks&Temples and ETH3D.
arXiv Detail & Related papers (2021-12-09T18:58:02Z)
PVStereo: Pyramid Voting Module for End-to-End Self-Supervised Stereo Matching [14.603116313499648]
We propose a robust and effective self-supervised stereo matching approach, consisting of a pyramid voting module (PVM) and a novel DCNN architecture, referred to as OptStereo. Specifically, our OptStereo first builds multi-scale cost volumes, and then adopts a recurrent unit to iteratively update disparity estimations at high resolution. We publish the HKUST-Drive dataset, a large-scale synthetic stereo dataset, collected under different illumination and weather conditions for research purposes.
arXiv Detail & Related papers (2021-03-12T05:27:14Z)
Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking [54.58791377183574]
Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map. Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
arXiv Detail & Related papers (2020-07-21T14:59:59Z)
AdaStereo: A Simple and Efficient Approach for Adaptive Stereo Matching [50.06646151004375]
A novel domain-adaptive pipeline called AdaStereo aims to align multi-level representations for deep stereo matching networks. Our AdaStereo models achieve state-of-the-art cross-domain performance on multiple stereo benchmarks, including KITTI, Middlebury, ETH3D, and DrivingStereo.
arXiv Detail & Related papers (2020-04-09T16:15:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.