DenVisCoM: Dense Vision Correspondence Mamba for Efficient and Real-time Optical Flow and Stereo Estimation
- URL: http://arxiv.org/abs/2602.01724v1
- Date: Mon, 02 Feb 2026 07:03:07 GMT
- Title: DenVisCoM: Dense Vision Correspondence Mamba for Efficient and Real-time Optical Flow and Stereo Estimation
- Authors: Tushar Anand, Maheswar Bora, Antitza Dantcheva, Abhijit Das,
- Abstract summary: We propose a novel Mamba block DenVisCoM for accurate and real-time estimation of optical flow and disparity estimation.<n>We extensively analyze the benchmark trade-off of accuracy and real-time processing on a large number of datasets.<n>Our experimental results and related analysis suggest that our proposed model can accurately estimate optical flow and disparity estimation in real time.
- Score: 9.539865774109343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we propose a novel Mamba block DenVisCoM, as well as a novel hybrid architecture specifically tailored for accurate and real-time estimation of optical flow and disparity estimation. Given that such multi-view geometry and motion tasks are fundamentally related, we propose a unified architecture to tackle them jointly. Specifically, the proposed hybrid architecture is based on DenVisCoM and a Transformer-based attention block that efficiently addresses real-time inference, memory footprint, and accuracy at the same time for joint estimation of motion and 3D dense perception tasks. We extensively analyze the benchmark trade-off of accuracy and real-time processing on a large number of datasets. Our experimental results and related analysis suggest that our proposed model can accurately estimate optical flow and disparity estimation in real time. All models and associated code are available at https://github.com/vimstereo/DenVisCoM.
Related papers
- CoWTracker: Tracking by Warping instead of Correlation [53.834673070954494]
We propose a dense point tracker that eschews cost volumes in favor of warping.<n>Inspired by recent advances in optical flow, our approach iteratively refines track estimates by warping features from the target frame to the query frame based on the current estimate.<n>Our model is simple and achieves state-of-the-art performance on standard dense point tracking benchmarks, including TAP-Vid-DAVIS, TAP-Vid-Kinetics, and Robo-TAP.
arXiv Detail & Related papers (2026-02-04T18:58:59Z) - Video Depth Propagation [54.523028170425256]
Existing methods rely on simple frame-by-frame monocular models, leading to temporal inconsistencies and inaccuracies.<n>We propose VeloDepth, which effectively leverages an online video pipeline and performs deep feature propagation.<n>Our design structurally enforces temporal consistency, resulting in stable depth predictions across consecutive frames with improved efficiency.
arXiv Detail & Related papers (2025-12-11T15:08:37Z) - DensePercept-NCSSD: Vision Mamba towards Real-time Dense Visual Perception with Non-Causal State Space Duality [2.036129241213064]
We propose an accurate and real-time optical flow and disparity estimation model by fusing pairwise input images.<n>Our proposed model reduces inference times while maintaining high accuracy and low GPU usage.
arXiv Detail & Related papers (2025-11-16T16:17:00Z) - ViM-Disparity: Bridging the Gap of Speed, Accuracy and Memory for Disparity Map Generation [1.1166701898428382]
We propose a Visual Mamba (ViM) based architecture, to dissolve the existing trade-off for real-time and accurate model with low computation overhead for disparity map generation (DMG)<n>We propose a performance measure that can jointly evaluate the inference speed, computation overhead and the accurateness of a DMG model.
arXiv Detail & Related papers (2024-12-21T19:41:10Z) - Transformer-Based Multi-Object Smoothing with Decoupled Data Association
and Smoothing [20.99082981430798]
Multi-object tracking (MOT) is the task of estimating the state trajectories of an unknown and time-varying number of objects over a certain time window.
Deep learning based algorithms are a possible venue for tackling this issue but have not been applied extensively in settings where accurate multi-object models are available.
We propose a novel DL architecture specifically tailored for this setting that decouples the data association task from the smoothing task.
arXiv Detail & Related papers (2023-12-22T20:24:39Z) - DH-PTAM: A Deep Hybrid Stereo Events-Frames Parallel Tracking And Mapping System [1.443696537295348]
This paper presents a robust approach for a visual parallel tracking and mapping (PTAM) system that excels in challenging environments.
Our proposed method combines the strengths of heterogeneous multi-modal visual sensors, in a unified reference frame.
Our implementation's research-based Python API is publicly available on GitHub.
arXiv Detail & Related papers (2023-06-02T19:52:13Z) - Unifying Flow, Stereo and Depth Estimation [121.54066319299261]
We present a unified formulation and model for three motion and 3D perception tasks.
We formulate all three tasks as a unified dense correspondence matching problem.
Our model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks.
arXiv Detail & Related papers (2022-11-10T18:59:54Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Robust Ego and Object 6-DoF Motion Estimation and Tracking [5.162070820801102]
This paper proposes a robust solution to achieve accurate estimation and consistent track-ability for dynamic multi-body visual odometry.
A compact and effective framework is proposed leveraging recent advances in semantic instance-level segmentation and accurate optical flow estimation.
A novel formulation, jointly optimizing SE(3) motion and optical flow is introduced that improves the quality of the tracked points and the motion estimation accuracy.
arXiv Detail & Related papers (2020-07-28T05:12:56Z) - Towards Streaming Perception [70.68520310095155]
We present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception.
The key insight behind this metric is to jointly evaluate the output of the entire perception stack at every time instant.
We focus on the illustrative tasks of object detection and instance segmentation in urban video streams, and contribute a novel dataset with high-quality and temporally-dense annotations.
arXiv Detail & Related papers (2020-05-21T01:51:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.