Related papers: CogStereo: Neural Stereo Matching with Implicit Spatial Cognition Embedding

CogStereo: Neural Stereo Matching with Implicit Spatial Cognition Embedding

URL: http://arxiv.org/abs/2510.22119v1
Date: Sat, 25 Oct 2025 02:09:04 GMT
Title: CogStereo: Neural Stereo Matching with Implicit Spatial Cognition Embedding
Authors: Lihuang Fang, Xiao Hu, Yuchen Zou, Hong Zhang,
Abstract summary: We introduce CogStereo, a novel framework that addresses challenging regions without relying on dataset-specific priors.<n>CogStereo embeds implicit spatial cognition into the refinement process by using monocular depth features as priors.<n>CogStereo employs a dual-conditional refinement mechanism that combines pixel-wise uncertainty with cognition-guided features for consistent global correction of mismatches.
Score: 5.663297699303346
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep stereo matching has advanced significantly on benchmark datasets through fine-tuning but falls short of the zero-shot generalization seen in foundation models in other vision tasks. We introduce CogStereo, a novel framework that addresses challenging regions, such as occlusions or weak textures, without relying on dataset-specific priors. CogStereo embeds implicit spatial cognition into the refinement process by using monocular depth features as priors, capturing holistic scene understanding beyond local correspondences. This approach ensures structurally coherent disparity estimation, even in areas where geometry alone is inadequate. CogStereo employs a dual-conditional refinement mechanism that combines pixel-wise uncertainty with cognition-guided features for consistent global correction of mismatches. Extensive experiments on Scene Flow, KITTI, Middlebury, ETH3D, EuRoc, and real-world demonstrate that CogStereo not only achieves state-of-the-art results but also excels in cross-domain generalization, shifting stereo vision towards a cognition-driven approach.

Related papers

Semantic-Deviation-Anchored Multi-Branch Fusion for Unsupervised Anomaly Detection and Localization in Unstructured Conveyor-Belt Coal Scenes [6.184948083111668]
textbfCoalAD is a benchmark for unsupervised foreign-object anomaly detection with pixel-level localization in coal-stream scenes.<n>We propose a complementary-cue collaborative perception framework that extracts and fuses complementary anomaly evidence from three perspectives.<n>Experiments on CoalAD demonstrate that our method outperforms widely used baselines across the evaluated image-level and pixel-level metrics.
arXiv Detail & Related papers (2026-02-07T20:36:24Z)
Seamlessly Natural: Image Stitching with Natural Appearance Preservation [0.6089774484591287]
SENA prioritizes structural fidelity in challenging real-world scenes characterized by parallax and depth variation.<n> SENA addresses fundamental limitations through three key contributions.<n>Experiments conducted on challenging datasets demonstrate that SENA achieves alignment accuracy comparable to leading homography-based methods.
arXiv Detail & Related papers (2026-01-03T18:40:35Z)
SonarSweep: Fusing Sonar and Vision for Robust 3D Reconstruction via Plane Sweeping [6.826863809223021]
Single-modality approaches to 3D reconstruction fail due to poor visibility and geometric constraints.<n>Prior fusion technique relies on flawed geometrics, leading to significant artifacts and an inability to model complex scenes.<n>In this paper, we introduce SonarSweep, a novel, end-to-end deep learning framework that overcomes these limitations.
arXiv Detail & Related papers (2025-11-01T04:12:27Z)
Deep Equilibrium Convolutional Sparse Coding for Hyperspectral Image Denoising [16.405355853358202]
Hyperspectral images (HSIs) play a crucial role in remote sensing but are often degraded by complex noise patterns.<n> Ensuring the physical property of the denoised HSIs is vital for robust HSI denoising, giving the rise of deep unfolding-based methods.<n>We propose a Deep Equilibrium Convolutional Sparse Coding (DECSC) framework that unifies local spatial-spectral correlations, nonlocal spatial self-similarities, and global spatial consistency.
arXiv Detail & Related papers (2025-08-21T13:35:11Z)
Zero-P-to-3: Zero-Shot Partial-View Images to 3D Object [55.93553895520324]
We propose a novel training-free approach that integrates local dense observations and multi-source priors for reconstruction.<n>Our method introduces a fusion-based strategy to effectively align these priors in DDIM sampling, thereby generating multi-view consistent images to supervise invisible views.
arXiv Detail & Related papers (2025-05-29T03:51:37Z)
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach [69.01456182499486]
textbfBR-Gen is a large-scale dataset of 150,000 locally forged images with diverse scene-aware annotations.<n>textbfNFA-ViT is a Noise-guided Forgery Amplification Vision Transformer that enhances the detection of localized forgeries.
arXiv Detail & Related papers (2025-04-16T09:57:23Z)
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model [62.37493746544967]
Camera-based setups offer a cost-effective option by using stereo depth estimation to generate dense, high-resolution depth maps.<n>Existing omnidirectional stereo matching approaches achieve only limited depth accuracy across diverse environments.<n>We present DFI-OmniStereo, a novel omnidirectional stereo matching method that leverages a large-scale pre-trained foundation model for relative monocular depth estimation.
arXiv Detail & Related papers (2025-03-30T16:24:22Z)
FoundationStereo: Zero-Shot Stereo Matching [50.79202911274819]
FoundationStereo is a foundation model for stereo depth estimation.<n>We first construct a large-scale (1M stereo pairs) synthetic training dataset.<n>We then design a number of network architecture components to enhance scalability.
arXiv Detail & Related papers (2025-01-17T01:01:44Z)
DEFOM-Stereo: Depth Foundation Model Based Stereo Matching [12.22373236061929]
DEFOM-Stereo is built to facilitate robust stereo matching with monocular depth cues.<n>It is verified to have much stronger zero-shot generalization compared with SOTA methods.<n>Our model simultaneously outperforms previous models on the individual benchmarks.
arXiv Detail & Related papers (2025-01-16T10:59:29Z)
ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios. We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out. Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z)
View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics. The proposed method addresses limitations in existing cross-view localization methods. It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z)
Towards Higher-order Topological Consistency for Unsupervised Network Alignment [41.763907024585926]
We propose a fully unsupervised network alignment framework named HTC. The proposed higher-order topological consistency is formulated based on edge orbits. The encoder is trained to be multi-orbit-aware and then be refined to identify more trusted anchor links.
arXiv Detail & Related papers (2022-08-26T07:09:13Z)
ChiTransformer:Towards Reliable Stereo from Cues [10.756828396434033]
Current stereo matching techniques are challenged by restricted searching space, occluded regions and sheer size. We present an optic-chiasm-inspired self-supervised binocular depth estimation method. ChiTransformer architecture yields substantial improvements over state-of-the-art self-supervised stereo approaches by 11%.
arXiv Detail & Related papers (2022-03-09T07:19:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.