MambaFlow: A Mamba-Centric Architecture for End-to-End Optical Flow Estimation
- URL: http://arxiv.org/abs/2503.07046v4
- Date: Mon, 18 Aug 2025 03:53:39 GMT
- Title: MambaFlow: A Mamba-Centric Architecture for End-to-End Optical Flow Estimation
- Authors: Juntian Du, Zhihu Zhou, Runzhe Zhang, Yuan Sun, Pinyi Chen, Keji Mao,
- Abstract summary: We introduce MambaFlow, a novel framework designed to leverage the high accuracy and efficiency of the Mamba architecture for capturing locally correlated features.<n>MambaFlow attains higher accuracy on the Sintel benchmark, demonstrating stronger potential for real-world deployment on resource-constrained devices.
- Score: 1.5828557827183316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the Mamba architecture has demonstrated significant successes in various computer vision tasks, such as classification and segmentation. However, its application to optical flow estimation remains unexplored. In this paper, we introduce MambaFlow, a novel framework designed to leverage the high accuracy and efficiency of the Mamba architecture for capturing locally correlated features while preserving global information in end-to-end optical flow estimation. To our knowledge, MambaFlow is the first architecture centered around the Mamba design tailored specifically for optical flow estimation. It comprises two key components: (1) PolyMamba, which enhances feature representation through a dual-Mamba architecture, incorporating a Self-Mamba module for intra-token modeling and a Cross-Mamba module for inter-modality interaction, enabling both deep contextualization and effective feature fusion; and (2) PulseMamba, which leverages an Attention Guidance Aggregator (AGA) to adaptively integrate features with dynamically learned weights in contrast to naive concatenation, and then employs the intrinsic recurrent mechanism of Mamba to perform autoregressive flow decoding, facilitating efficient flow information dissemination. Extensive experiments demonstrate that MambaFlow achieves remarkable results comparable to mainstream methods on benchmark datasets. Compared to SEA-RAFT, MambaFlow attains higher accuracy on the Sintel benchmark, demonstrating stronger potential for real-world deployment on resource-constrained devices. The source code will be made publicly available upon acceptance of the paper.
Related papers
- DYNAMAX: Dynamic computing for Transformers and Mamba based architectures [2.5739385355356714]
Early exits (EEs) offer a promising approach to reducing computational costs and latency by dynamically terminating inference once a satisfactory prediction confidence on a data sample is achieved.<n>This work introduces DYNAMAX, the first framework to exploit the unique properties of Mamba architectures for early exit mechanisms.
arXiv Detail & Related papers (2025-04-29T16:38:15Z) - TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba [88.31117598044725]
We explore cross-architecture training to transfer the ready knowledge in existing Transformer models to alternative architecture Mamba, termed TransMamba.<n>Our approach employs a two-stage strategy to expedite training new Mamba models, ensuring effectiveness in across uni-modal and cross-modal tasks.<n>For cross-modal learning, we propose a cross-Mamba module that integrates language awareness into Mamba's visual features, enhancing the cross-modal interaction capabilities of Mamba architecture.
arXiv Detail & Related papers (2025-02-21T01:22:01Z) - MambaGlue: Fast and Robust Local Feature Matching With Mamba [9.397265252815115]
We propose a novel Mamba-based local feature matching approach, called MambaGlue.<n>Mamba is an emerging state-of-the-art architecture rapidly gaining recognition for its superior speed in both training and inference.<n>Our MambaGlue achieves a balance between robustness and efficiency in real-world applications.
arXiv Detail & Related papers (2025-02-01T15:43:03Z) - FlowMamba: Learning Point Cloud Scene Flow with Global Motion Propagation [14.293476753863272]
We propose a novel global-aware scene flow estimation network with global motion propagation, named FlowMamba.<n>FlowMamba is the first method to achieve millimeter-level prediction accuracy in FlyingThings3D and KITTI datasets.
arXiv Detail & Related papers (2024-12-23T08:03:59Z) - Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision.
In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z) - MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection [13.678314551293113]
MamKPD is the first efficient yet effective mamba-based pose estimation framework for 2D keypoint detection.<n>By combining Mamba for global modeling across all patches, MamKPD effectively extracts instances' pose information.<n>Our MamKPD-L achieves 77.3% AP on the COCO dataset with 1492 FPS on an NVIDIA GTX 4090 GPU.
arXiv Detail & Related papers (2024-12-02T12:03:32Z) - MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs.
We propose the MobileMamba framework, which balances efficiency and performance.
MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z) - MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba [0.5530212768657544]
Mamba, a State Space Model (SSM)-based model, has attracted attention as a potential alternative to Transformers.<n>We investigate the effectiveness of existing PEFT methods for Transformers when applied to Mamba.<n>We propose new Mamba-specific PEFT methods that leverage the distinctive structure of Mamba.
arXiv Detail & Related papers (2024-11-06T11:57:55Z) - Mamba for Scalable and Efficient Personalized Recommendations [0.135975510645475]
We present a novel hybrid model that replaces Transformer layers with Mamba layers within the FT-Transformer architecture.
We evaluate FT-Mamba in comparison to a traditional Transformer-based model within a Two-Tower architecture on three datasets.
arXiv Detail & Related papers (2024-09-11T14:26:14Z) - ReMamba: Equip Mamba with Effective Long-Sequence Modeling [50.530839868893786]
We propose ReMamba, which enhances Mamba's ability to comprehend long contexts.<n>ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process.
arXiv Detail & Related papers (2024-08-28T02:47:27Z) - SIGMA: Selective Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.<n>We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.<n>Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, MambaVision, specifically tailored for vision applications.<n>We show that equipping the Mamba architecture with self-attention blocks in the final layers greatly improves its capacity to capture long-range spatial dependencies.<n>For classification on the ImageNet-1K dataset, MambaVision variants achieve state-of-the-art (SOTA) performance in terms of both Top-1 accuracy and throughput.
arXiv Detail & Related papers (2024-07-10T23:02:45Z) - MambaDepth: Enhancing Long-range Dependency for Self-Supervised Fine-Structured Monocular Depth Estimation [0.0]
MambaDepth is a versatile network tailored for self-supervised depth estimation.
MambaDepth combines the U-Net's effectiveness in self-supervised depth estimation with the advanced capabilities of Mamba.
MambaDepth proves its superior generalization capacities on other datasets such as Make3D and Cityscapes.
arXiv Detail & Related papers (2024-06-06T22:08:48Z) - MambaUIE&SR: Unraveling the Ocean's Secrets with Only 2.8 GFLOPs [1.7648680700685022]
Underwater Image Enhancement (UIE) techniques aim to address the problem of underwater image degradation due to light absorption and scattering.
Recent years, both Convolution Neural Network (CNN)-based and Transformer-based methods have been widely explored.
MambaUIE is able to efficiently synthesize global and local information and maintains a very small number of parameters with high accuracy.
arXiv Detail & Related papers (2024-04-22T05:12:11Z) - MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection [72.46396769642787]
We develop a nested structure, Mamba-in-Mamba (MiM-ISTD), for efficient infrared small target detection.
MiM-ISTD is $8 times$ faster than the SOTA method and reduces GPU memory usage by 62.2$%$ when testing on $2048 times 2048$ images.
arXiv Detail & Related papers (2024-03-04T15:57:29Z) - GMFlow: Learning Optical Flow via Global Matching [124.57850500778277]
We propose a GMFlow framework for learning optical flow estimation.
It consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation.
Our new framework outperforms 32-iteration RAFT's performance on the challenging Sintel benchmark.
arXiv Detail & Related papers (2021-11-26T18:59:56Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.