ViM-Disparity: Bridging the Gap of Speed, Accuracy and Memory for Disparity Map Generation
- URL: http://arxiv.org/abs/2412.16745v2
- Date: Fri, 10 Jan 2025 14:40:49 GMT
- Title: ViM-Disparity: Bridging the Gap of Speed, Accuracy and Memory for Disparity Map Generation
- Authors: Maheswar Bora, Tushar Anand, Saurabh Atreya, Aritra Mukherjee, Abhijit Das,
- Abstract summary: We propose a Visual Mamba (ViM) based architecture, to dissolve the existing trade-off for real-time and accurate model with low computation overhead for disparity map generation (DMG)<n>We propose a performance measure that can jointly evaluate the inference speed, computation overhead and the accurateness of a DMG model.
- Score: 1.1166701898428382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we propose a Visual Mamba (ViM) based architecture, to dissolve the existing trade-off for real-time and accurate model with low computation overhead for disparity map generation (DMG). Moreover, we proposed a performance measure that can jointly evaluate the inference speed, computation overhead and the accurateness of a DMG model. The code implementation and corresponding models are available at: https://github.com/MBora/ViM-Disparity.
Related papers
- DenVisCoM: Dense Vision Correspondence Mamba for Efficient and Real-time Optical Flow and Stereo Estimation [9.539865774109343]
We propose a novel Mamba block DenVisCoM for accurate and real-time estimation of optical flow and disparity estimation.<n>We extensively analyze the benchmark trade-off of accuracy and real-time processing on a large number of datasets.<n>Our experimental results and related analysis suggest that our proposed model can accurately estimate optical flow and disparity estimation in real time.
arXiv Detail & Related papers (2026-02-02T07:03:07Z) - DensePercept-NCSSD: Vision Mamba towards Real-time Dense Visual Perception with Non-Causal State Space Duality [2.036129241213064]
We propose an accurate and real-time optical flow and disparity estimation model by fusing pairwise input images.<n>Our proposed model reduces inference times while maintaining high accuracy and low GPU usage.
arXiv Detail & Related papers (2025-11-16T16:17:00Z) - M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models [72.75501495786297]
We introduce a novel hybrid linear RNN reasoning model, M1, built on the Mamba architecture.
Experimental results show that M1 not only outperforms previous linear RNN models but also matches the performance of state-of-the-art DeepSeek R1 distilled reasoning models.
arXiv Detail & Related papers (2025-04-14T17:38:25Z) - Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks.
By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z) - VADMamba: Exploring State Space Models for Fast Video Anomaly Detection [4.874215132369157]
VQ-Mamba Unet (VQ-MaU) framework incorporates a Vector Quantization (VQ) layer and Mamba-based Non-negative Visual State Space (NVSS) block.
Results validate the efficacy of the proposed VADMamba across three benchmark datasets.
arXiv Detail & Related papers (2025-03-27T05:38:12Z) - MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking [51.28485682954006]
We propose a pure Mamba-based framework (MambaVT) to fully exploit intrinsic-temporal contextual modeling for robust visible-thermal tracking.
Specifically, we devise the long-range cross-frame integration component to globally adapt to target appearance variations.
Experiments show the significant potential of vision Mamba for RGB-T tracking, with MambaVT achieving state-of-the-art performance on four mainstream benchmarks.
arXiv Detail & Related papers (2024-08-15T02:29:00Z) - LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba [54.85262314960038]
Local Attentional Mamba blocks capture both global contexts and local details with linear complexity.
Our model exhibits exceptional scalability and surpasses the performance of DiT across various model scales on ImageNet at 256x256 resolution.
Compared to state-of-the-art diffusion models on ImageNet 256x256 and 512x512, our largest model presents notable advantages, such as a reduction of up to 62% GFLOPs.
arXiv Detail & Related papers (2024-08-05T16:39:39Z) - Progressive Query Refinement Framework for Bird's-Eye-View Semantic Segmentation from Surrounding Images [3.495246564946556]
We introduce the Multi-Resolution (MR) concept into Bird's-Eye-View (BEV) semantic segmentation for autonomous driving.
We propose a visual feature interaction network that promotes interactions between features across images and across feature levels.
We evaluate our model on a large-scale real-world dataset.
arXiv Detail & Related papers (2024-07-24T05:00:31Z) - MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)
MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.
We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - Deep Learning Methods for Adjusting Global MFD Speed Estimations to Local Link Configurations [4.2185937778110825]
This study introduces a Local Correction Factor (LCF) that integrates MFD-derived network mean speed with network configurations to accurately estimate the individual speed of a link.
We use a novel deep learning framework to capture both spatial configurations and temporal dynamics of the network.
Our model enhances the precision of link-level traffic speed estimations while preserving the computational benefits of aggregate models.
arXiv Detail & Related papers (2024-05-23T07:37:33Z) - Replication Study and Benchmarking of Real-Time Object Detection Models [0.0]
We compare a variety of object detection models' accuracy and inference speed on multiple graphics cards.
We propose a unified training and evaluation pipeline, based on MMDetection's features, to better compare models.
Results exhibit a strong trade-off between accuracy and speed, prevailed by anchor-free models.
arXiv Detail & Related papers (2024-05-11T04:47:50Z) - MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection [72.46396769642787]
We develop a nested structure, Mamba-in-Mamba (MiM-ISTD), for efficient infrared small target detection.
MiM-ISTD is $8 times$ faster than the SOTA method and reduces GPU memory usage by 62.2$%$ when testing on $2048 times 2048$ images.
arXiv Detail & Related papers (2024-03-04T15:57:29Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - Incremental Multimodal Surface Mapping via Self-Organizing Gaussian
Mixture Models [1.0878040851638]
This letter describes an incremental multimodal surface mapping methodology, which represents the environment as a continuous probabilistic model.
The strategy employed in this work utilizes Gaussian mixture models (GMMs) to represent the environment.
To bridge this gap, this letter introduces a spatial hash map for rapid GMM submap extraction combined with an approach to determine relevant and redundant data in a point cloud.
arXiv Detail & Related papers (2023-09-19T19:49:03Z) - MapPrior: Bird's-Eye View Map Layout Estimation with Generative Models [24.681557413829317]
MapPrior is a novel BEV perception framework that combines a traditional BEV perception model with a learned generative model for semantic map layouts.
At the time of submission, MapPrior outperforms the strongest competing method, with significantly improved MMD and ECE scores in camera- and LiDAR-based BEV perception.
arXiv Detail & Related papers (2023-08-24T17:58:30Z) - An Online Semantic Mapping System for Extending and Enhancing Visual
SLAM [2.538209532048867]
We present a real-time semantic mapping approach for mobile vision systems with a 2D to 3D object detection pipeline and rapid data association for generated landmarks.
Our system reaches real-time capabilities with an average iteration duration of 65ms and is able to improve the pose estimation of a state-of-the-art SLAM by up to 68% on a public dataset.
arXiv Detail & Related papers (2022-03-08T09:14:37Z) - FastSal: a Computationally Efficient Network for Visual Saliency
Prediction [7.742198347952173]
We show that MobileNetV2 makes an excellent backbone for a visual saliency model and can be effective even without a complex decoder.
We also show that knowledge transfer from a more computationally expensive model like DeepGaze II can be achieved via pseudo-labelling an unlabelled dataset.
arXiv Detail & Related papers (2020-08-25T16:32:33Z) - TAM: Temporal Adaptive Module for Video Recognition [60.83208364110288]
temporal adaptive module (bf TAM) generates video-specific temporal kernels based on its own feature map.
Experiments on Kinetics-400 and Something-Something datasets demonstrate that our TAM outperforms other temporal modeling methods consistently.
arXiv Detail & Related papers (2020-05-14T08:22:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.