DensePercept-NCSSD: Vision Mamba towards Real-time Dense Visual Perception with Non-Causal State Space Duality
- URL: http://arxiv.org/abs/2511.12671v1
- Date: Sun, 16 Nov 2025 16:17:00 GMT
- Title: DensePercept-NCSSD: Vision Mamba towards Real-time Dense Visual Perception with Non-Causal State Space Duality
- Authors: Tushar Anand, Advik Sinha, Abhijit Das,
- Abstract summary: We propose an accurate and real-time optical flow and disparity estimation model by fusing pairwise input images.<n>Our proposed model reduces inference times while maintaining high accuracy and low GPU usage.
- Score: 2.036129241213064
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we propose an accurate and real-time optical flow and disparity estimation model by fusing pairwise input images in the proposed non-causal selective state space for dense perception tasks. We propose a non-causal Mamba block-based model that is fast and efficient and aptly manages the constraints present in a real-time applications. Our proposed model reduces inference times while maintaining high accuracy and low GPU usage for optical flow and disparity map generation. The results and analysis, and validation in real-life scenario justify that our proposed model can be used for unified real-time and accurate 3D dense perception estimation tasks. The code, along with the models, can be found at https://github.com/vimstereo/DensePerceptNCSSD
Related papers
- DenVisCoM: Dense Vision Correspondence Mamba for Efficient and Real-time Optical Flow and Stereo Estimation [9.539865774109343]
We propose a novel Mamba block DenVisCoM for accurate and real-time estimation of optical flow and disparity estimation.<n>We extensively analyze the benchmark trade-off of accuracy and real-time processing on a large number of datasets.<n>Our experimental results and related analysis suggest that our proposed model can accurately estimate optical flow and disparity estimation in real time.
arXiv Detail & Related papers (2026-02-02T07:03:07Z) - Video Depth Propagation [54.523028170425256]
Existing methods rely on simple frame-by-frame monocular models, leading to temporal inconsistencies and inaccuracies.<n>We propose VeloDepth, which effectively leverages an online video pipeline and performs deep feature propagation.<n>Our design structurally enforces temporal consistency, resulting in stable depth predictions across consecutive frames with improved efficiency.
arXiv Detail & Related papers (2025-12-11T15:08:37Z) - Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline [64.42938561167402]
We propose an online 3D reconstruction method using 3D Gaussian-based SLAM, combined with a feed-forward recurrent prediction module.<n>This approach replaces slow test-time optimization with fast network inference, significantly improving tracking speed.<n>Our method achieves performance on par with the state-of-the-art SplaTAM, while reducing tracking time by more than 90%.
arXiv Detail & Related papers (2025-08-06T16:16:58Z) - Online Traffic Density Estimation using Physics-Informed Neural Networks [5.888531936968298]
In this paper, we introduce a methodology for online approximation of the traffic density using measurements from probe vehicles.<n>The proposed method continuously estimates the real-time traffic density in space and performs model identification with each new set of measurements.
arXiv Detail & Related papers (2025-04-04T14:41:22Z) - VADMamba: Exploring State Space Models for Fast Video Anomaly Detection [4.874215132369157]
VQ-Mamba Unet (VQ-MaU) framework incorporates a Vector Quantization (VQ) layer and Mamba-based Non-negative Visual State Space (NVSS) block.<n>Results validate the efficacy of the proposed VADMamba across three benchmark datasets.
arXiv Detail & Related papers (2025-03-27T05:38:12Z) - MambaFlow: A Novel and Flow-guided State Space Model for Scene Flow Estimation [5.369567679302849]
We propose Mamba, a novel scene flow estimation network with a mamba-based decoder.<n>MambaFlow achieves state-of-the-art performance with real-time inference speed among existing works.<n>Experiments on the Argoverse 2 benchmark demonstrate that MambaFlow achieves state-of-the-art performance with real-time inference speed.
arXiv Detail & Related papers (2025-02-24T07:05:49Z) - Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think [53.2706196341054]
We show that the perceived inefficiency was caused by a flaw in the inference pipeline that has so far gone unnoticed.<n>We perform end-to-end fine-tuning on top of the single-step model with task-specific losses and get a deterministic model that outperforms all other diffusion-based depth and normal estimation models.
arXiv Detail & Related papers (2024-09-17T16:58:52Z) - Rethinking Voxelization and Classification for 3D Object Detection [68.8204255655161]
The main challenge in 3D object detection from LiDAR point clouds is achieving real-time performance without affecting the reliability of the network.
We present a solution to improve network inference speed and precision at the same time by implementing a fast dynamic voxelizer.
In addition, we propose a lightweight detection sub-head model for classifying predicted objects and filter out false detected objects.
arXiv Detail & Related papers (2023-01-10T16:22:04Z) - PRISM: Probabilistic Real-Time Inference in Spatial World Models [52.878769723544615]
PRISM is a method for real-time filtering in a probabilistic generative model of agent motion and visual perception.
The proposed solution runs at 10Hz real-time and is similarly accurate to state-of-the-art SLAM in small to medium-sized indoor environments.
arXiv Detail & Related papers (2022-12-06T13:59:06Z) - Accurate and Real-time Pseudo Lidar Detection: Is Stereo Neural Network
Really Necessary? [6.8067583993953775]
We develop a system with a less powerful stereo matching predictor and adopt the proposed refinement schemes to improve the accuracy.
The presented system achieves competitive accuracy to the state-of-the-art approaches with only 23 ms computing, showing it is a suitable candidate for deploying to real car-hold applications.
arXiv Detail & Related papers (2022-06-28T09:53:00Z) - A Generative Learning Approach for Spatio-temporal Modeling in Connected
Vehicular Network [55.852401381113786]
This paper proposes LaMI (Latency Model Inpainting), a novel framework to generate a comprehensive-temporal quality framework for wireless access latency of connected vehicles.
LaMI adopts the idea from image inpainting and synthesizing and can reconstruct the missing latency samples by a two-step procedure.
In particular, it first discovers the spatial correlation between samples collected in various regions using a patching-based approach and then feeds the original and highly correlated samples into a Varienational Autocoder (VAE)
arXiv Detail & Related papers (2020-03-16T03:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.