Related papers: IGEV++: Iterative Multi-range Geometry Encoding Volumes for Stereo Matching

IGEV++: Iterative Multi-range Geometry Encoding Volumes for Stereo Matching

URL: http://arxiv.org/abs/2409.00638v1
Date: Sun, 1 Sep 2024 07:02:36 GMT
Title: IGEV++: Iterative Multi-range Geometry Encoding Volumes for Stereo Matching
Authors: Gangwei Xu, Xianqi Wang, Zhaoxing Zhang, Junda Cheng, Chunyuan Liao, Xin Yang,
Abstract summary: We propose a new deep network architecture, called IGEV++, for stereo matching. The proposed IGEV++ builds Multi-range Geometry Volumes (MGEV) that encode coarse-grained geometry information for ill-posed regions. We introduce an adaptive patch matching module that efficiently computes matching costs for large disparity ranges and/or ill-posed regions.
Score: 7.859381791267791
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stereo matching is a core component in many computer vision and robotics systems. Despite significant advances over the last decade, handling matching ambiguities in ill-posed regions and large disparities remains an open challenge. In this paper, we propose a new deep network architecture, called IGEV++, for stereo matching. The proposed IGEV++ builds Multi-range Geometry Encoding Volumes (MGEV) that encode coarse-grained geometry information for ill-posed regions and large disparities and fine-grained geometry information for details and small disparities. To construct MGEV, we introduce an adaptive patch matching module that efficiently and effectively computes matching costs for large disparity ranges and/or ill-posed regions. We further propose a selective geometry feature fusion module to adaptively fuse multi-range and multi-granularity geometry features in MGEV. We then index the fused geometry features and input them to ConvGRUs to iteratively update the disparity map. MGEV allows to efficiently handle large disparities and ill-posed regions, such as occlusions and textureless regions, and enjoys rapid convergence during iterations. Our IGEV++ achieves the best performance on the Scene Flow test set across all disparity ranges, up to 768px. Our IGEV++ also achieves state-of-the-art accuracy on the Middlebury, ETH3D, KITTI 2012, and 2015 benchmarks. Specifically, IGEV++ achieves a 3.23% 2-pixel outlier rate (Bad 2.0) on the large disparity benchmark, Middlebury, representing error reductions of 31.9% and 54.8% compared to RAFT-Stereo and GMStereo, respectively. We also present a real-time version of IGEV++ that achieves the best performance among all published real-time methods on the KITTI benchmarks. The code is publicly available at https://github.com/gangweiX/IGEV-plusplus

Related papers

SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer [62.11796778482088]
We present a novel model-agnostic sparse vision transformer, dubbed SparseFormer, to bridge the gap of object detection between close-up and HRW shots. The proposed SparseFormer selectively uses attentive tokens to scrutinize the sparsely distributed windows that may contain objects. experiments on two HRW benchmarks, PANDA and DOTA-v1.0, demonstrate that the proposed SparseFormer significantly improves detection accuracy (up to 5.8%) and speed (up to 3x) over the state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-11T03:21:25Z)
PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings Reconstruction via Semantic-Aware Grouping [6.160345720038265]
We introduce a parallel Gaussian splatting method, termed PG-SAG, which fully exploits semantic cues for both partitioning and kernel optimization. Experiments are tested on various urban datasets, the results demonstrated the superior performance of our PG-SAG on building surface reconstruction.
arXiv Detail & Related papers (2025-01-03T07:40:16Z)
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes [53.107474952492396]
CityGaussianV2 is a novel approach for large-scale scene reconstruction. We implement a decomposed-gradient-based densification and depth regression technique to eliminate blurry artifacts and accelerate convergence. Our method strikes a promising balance between visual quality, geometric accuracy, as well as storage and training costs.
arXiv Detail & Related papers (2024-11-01T17:59:31Z)
DMESA: Densely Matching Everything by Segmenting Anything [16.16319526547664]
We propose MESA and DMESA as novel feature matching methods. MESA establishes implicit-semantic area matching prior to point matching, based on advanced image understanding of SAM. With less repetitive computation, DMESA showcases a speed improvement of nearly five times compared to MESA.
arXiv Detail & Related papers (2024-08-01T04:39:36Z)
Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement [8.592248643229675]
Occupancy prediction plays a pivotal role in autonomous driving (AD) Existing methods often incur high computational costs, which contradicts the real-time demands of AD. We propose a Geometric-Semantic Dual-Branch Network (GSDBN) with a hybrid BEV-Voxel representation.
arXiv Detail & Related papers (2024-07-18T04:46:13Z)
Grounding Image Matching in 3D with MASt3R [8.14650201701567]
We propose to cast matching as a 3D task with DUSt3R, a powerful 3D reconstruction framework based on Transformers. We propose to augment the DUSt3R network with a new head that outputs dense local features, trained with an additional matching loss. Our approach, coined MASt3R, significantly outperforms the state of the art on multiple matching tasks.
arXiv Detail & Related papers (2024-06-14T06:46:30Z)
SAGS: Structure-Aware 3D Gaussian Splatting [53.6730827668389]
We propose a structure-aware Gaussian Splatting method (SAGS) that implicitly encodes the geometry of the scene. SAGS reflects to state-of-the-art rendering performance and reduced storage requirements on benchmark novel-view synthesis datasets.
arXiv Detail & Related papers (2024-04-29T23:26:30Z)
SATR: Zero-Shot Semantic Segmentation of 3D Shapes [74.08209893396271]
We explore the task of zero-shot semantic segmentation of 3D shapes by using large-scale off-the-shelf 2D image recognition models. We develop the Assignment with Topological Reweighting (SATR) algorithm and evaluate it on ShapeNetPart and our proposed FAUST benchmarks. SATR achieves state-of-the-art performance and outperforms a baseline algorithm by 1.3% and 4% average mIoU.
arXiv Detail & Related papers (2023-04-11T00:43:16Z)
Iterative Geometry Encoding Volume for Stereo Matching [4.610675756857714]
IGEV-Stereo builds a combined geometry encoding volume that encodes geometry and context information as well as local matching details. Our IGEV-Stereo ranks $1st$ on KITTI 2015 and 2012 (Reflective) among all published methods and is the fastest among the top 10 methods. We also extend our IGEV to multi-view stereo (MVS) to achieve competitive accuracy on DTU benchmark.
arXiv Detail & Related papers (2023-03-12T09:11:14Z)
Parallel Structure from Motion for UAV Images via Weighted Connected Dominating Set [5.17395782758526]
This paper proposes an algorithm to extract the global model for cluster merging and designs a parallel SfM solution to achieve efficient and accurate UAV image orientation. The experimental results demonstrate that the proposed parallel SfM can achieve 17.4 times efficiency improvement and comparative orientation accuracy.
arXiv Detail & Related papers (2022-06-23T06:53:06Z)
VoGE: A Differentiable Volume Renderer using Gaussian Ellipsoids for Analysis-by-Synthesis [62.47221232706105]
We propose VoGE, which utilizes the Gaussian reconstruction kernels as volumetric primitives. To efficiently render via VoGE, we propose an approximate closeform solution for the volume density aggregation and a coarse-to-fine rendering strategy. VoGE outperforms SoTA when applied to various vision tasks, e.g., object pose estimation, shape/texture fitting, and reasoning.
arXiv Detail & Related papers (2022-05-30T19:52:11Z)
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [105.96557764248846]
We introduce BEVFusion, a generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view representation space. It achieves 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower cost.
arXiv Detail & Related papers (2022-05-26T17:59:35Z)
Boundary-Aware Geometric Encoding for Semantic Segmentation of Point Clouds [45.270215729464056]
Boundary information plays a significant role in 2D image segmentation, while usually being ignored in 3D point cloud segmentation. We propose a Boundary Prediction Module (BPM) to predict boundary points. Based on the predicted boundary, a boundary-aware Geometric. GEM is designed to encode geometric information and aggregate features with discrimination in a neighborhood.
arXiv Detail & Related papers (2021-01-07T05:38:19Z)
Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation [91.12575065731883]
We propose Complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding box regression and Non-Maximum Suppression (NMS) The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted $ell_n$-norm loss and IoU-based loss. Cluster-NMS is very efficient due to its pure GPU implementation, and geometric factors can be incorporated to improve both AP and AR.
arXiv Detail & Related papers (2020-05-07T16:00:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.