Related papers: CCNeXt: An Effective Self-Supervised Stereo Depth Estimation Approach

CCNeXt: An Effective Self-Supervised Stereo Depth Estimation Approach

URL: http://arxiv.org/abs/2509.22627v1
Date: Fri, 26 Sep 2025 17:51:28 GMT
Title: CCNeXt: An Effective Self-Supervised Stereo Depth Estimation Approach
Authors: Alexandre Lopes, Roberto Souza, Helio Pedrini,
Abstract summary: We propose a novel self-supervised convolutional approach for depth estimation.<n>The proposed CCNeXt architecture employs a modern CNN feature extractor with a novel windowed epipolar cross-attention module in the encoder.<n>Our experiments demonstrate that CCNeXt achieves competitive metrics while being 10.18$times$ faster than the current best model.
Score: 44.23836177312291
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Depth Estimation plays a crucial role in recent applications in robotics, autonomous vehicles, and augmented reality. These scenarios commonly operate under constraints imposed by computational power. Stereo image pairs offer an effective solution for depth estimation since it only needs to estimate the disparity of pixels in image pairs to determine the depth in a known rectified system. Due to the difficulty in acquiring reliable ground-truth depth data across diverse scenarios, self-supervised techniques emerge as a solution, particularly when large unlabeled datasets are available. We propose a novel self-supervised convolutional approach that outperforms existing state-of-the-art Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) while balancing computational cost. The proposed CCNeXt architecture employs a modern CNN feature extractor with a novel windowed epipolar cross-attention module in the encoder, complemented by a comprehensive redesign of the depth estimation decoder. Our experiments demonstrate that CCNeXt achieves competitive metrics on the KITTI Eigen Split test data while being 10.18$\times$ faster than the current best model and achieves state-of-the-art results in all metrics in the KITTI Eigen Split Improved Ground Truth and Driving Stereo datasets when compared to recently proposed techniques. To ensure complete reproducibility, our project is accessible at \href{https://github.com/alelopes/CCNext}{\texttt{https://github.com/alelopes/CCNext}}.

Related papers

Knowledge-Informed Neural Network for Complex-Valued SAR Image Recognition [51.03674130115878]
We introduce the Knowledge-Informed Neural Network (KINN), a lightweight framework built upon a novel "compression-aggregation-compression" architecture.<n>KINN establishes a state-of-the-art in parameter-efficient recognition, offering exceptional generalization in data-scarce and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-10-23T07:12:26Z)
ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving [62.9051914830949]
We present ROVR, a large-scale, diverse, and cost-efficient depth dataset designed to capture the complexity of real-world driving.<n>A lightweight acquisition pipeline ensures scalable collection, while sparse but statistically sufficient ground truth supports robust training.<n> Benchmarking with state-of-the-art monocular depth models reveals severe cross-dataset generalization failures.
arXiv Detail & Related papers (2025-08-19T16:13:49Z)
FoundationStereo: Zero-Shot Stereo Matching [50.79202911274819]
FoundationStereo is a foundation model for stereo depth estimation.<n>We first construct a large-scale (1M stereo pairs) synthetic training dataset.<n>We then design a number of network architecture components to enhance scalability.
arXiv Detail & Related papers (2025-01-17T01:01:44Z)
URCDC-Depth: Uncertainty Rectified Cross-Distillation with CutFlip for Monocular Depth Estimation [24.03121823263355]
We introduce an uncertainty rectified cross-distillation between Transformer and convolutional neural network (CNN) to learn a unified depth estimator. Specifically, we use the depth estimates from the Transformer branch and the CNN branch as pseudo labels to teach each other. We propose a surprisingly simple yet highly effective data augmentation technique CutFlip, which enforces the model to exploit more valuable clues apart from the vertical image position for depth inference.
arXiv Detail & Related papers (2023-02-16T08:53:08Z)
Multi-Frame Self-Supervised Depth with Transformers [33.00363651105475]
We propose a novel transformer architecture for cost volume generation. We use depth-discretized epipolar sampling to select matching candidates. We refine predictions through a series of self- and cross-attention layers.
arXiv Detail & Related papers (2022-04-15T19:04:57Z)
DepthFormer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation. We propose to leverage the Transformer to model this global context with an effective attention mechanism. Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z)
PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net. Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z)
PVStereo: Pyramid Voting Module for End-to-End Self-Supervised Stereo Matching [14.603116313499648]
We propose a robust and effective self-supervised stereo matching approach, consisting of a pyramid voting module (PVM) and a novel DCNN architecture, referred to as OptStereo. Specifically, our OptStereo first builds multi-scale cost volumes, and then adopts a recurrent unit to iteratively update disparity estimations at high resolution. We publish the HKUST-Drive dataset, a large-scale synthetic stereo dataset, collected under different illumination and weather conditions for research purposes.
arXiv Detail & Related papers (2021-03-12T05:27:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.