EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction
- URL: http://arxiv.org/abs/2503.19721v3
- Date: Tue, 01 Apr 2025 02:49:17 GMT
- Title: EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction
- Authors: Chengjie Ge, Xueyang Fu, Peng He, Kunyu Wang, Chengzhi Cao, Zheng-Jun Zha,
- Abstract summary: EventMamba is a specialized model designed for event-based video reconstruction tasks.<n>We show that EventMamba markedly improves speed while delivering superior visual quality compared to Transformer-based methods.
- Score: 66.84997711357101
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Leveraging its robust linear global modeling capability, Mamba has notably excelled in computer vision. Despite its success, existing Mamba-based vision models have overlooked the nuances of event-driven tasks, especially in video reconstruction. Event-based video reconstruction (EBVR) demands spatial translation invariance and close attention to local event relationships in the spatio-temporal domain. Unfortunately, conventional Mamba algorithms apply static window partitions and standard reshape scanning methods, leading to significant losses in local connectivity. To overcome these limitations, we introduce EventMamba--a specialized model designed for EBVR tasks. EventMamba innovates by incorporating random window offset (RWO) in the spatial domain, moving away from the restrictive fixed partitioning. Additionally, it features a new consistent traversal serialization approach in the spatio-temporal domain, which maintains the proximity of adjacent events both spatially and temporally. These enhancements enable EventMamba to retain Mamba's robust modeling capabilities while significantly preserving the spatio-temporal locality of event data. Comprehensive testing on multiple datasets shows that EventMamba markedly enhances video reconstruction, drastically improving computation speed while delivering superior visual quality compared to Transformer-based methods.
Related papers
- MambaFlow: A Novel and Flow-guided State Space Model for Scene Flow Estimation [5.369567679302849]
We propose Mamba, a novel scene flow estimation network with a mamba-based decoder.<n>MambaFlow achieves state-of-the-art performance with real-time inference speed among existing works.<n>Experiments on the Argoverse 2 benchmark demonstrate that MambaFlow achieves state-of-the-art performance with real-time inference speed.
arXiv Detail & Related papers (2025-02-24T07:05:49Z) - STNMamba: Mamba-based Spatial-Temporal Normality Learning for Video Anomaly Detection [48.997518615379995]
Video anomaly detection (VAD) has been extensively researched due to its potential for intelligent video systems.<n>Most existing methods based on CNNs and transformers still suffer from substantial computational burdens.<n>We propose a lightweight and effective Mamba-based network named STNMamba to enhance the learning of spatial-temporal normality.
arXiv Detail & Related papers (2024-12-28T08:49:23Z) - Mamba-CL: Optimizing Selective State Space Model in Null Space for Continual Learning [54.19222454702032]
Continual Learning aims to equip AI models with the ability to learn a sequence of tasks over time, without forgetting previously learned knowledge.
State Space Models (SSMs) have achieved notable success in computer vision.
We introduce Mamba-CL, a framework that continuously fine-tunes the core SSMs of the large-scale Mamba foundation model.
arXiv Detail & Related papers (2024-11-23T06:36:16Z) - MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking [51.28485682954006]
We propose a pure Mamba-based framework (MambaVT) to fully exploit intrinsic-temporal contextual modeling for robust visible-thermal tracking.
Specifically, we devise the long-range cross-frame integration component to globally adapt to target appearance variations.
Experiments show the significant potential of vision Mamba for RGB-T tracking, with MambaVT achieving state-of-the-art performance on four mainstream benchmarks.
arXiv Detail & Related papers (2024-08-15T02:29:00Z) - Cross-Scan Mamba with Masked Training for Robust Spectral Imaging [51.557804095896174]
We propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding.
Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
arXiv Detail & Related papers (2024-08-01T15:14:10Z) - DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2 framework [2.187990941788468]
generative model crafted to create highly personalized 3D full-body gestures solely from raw speech audio.
Model integrates a Mamba-based fuzzy feature extractor with a non-autoregressive Adaptive Layer Normalization (AdaLN) Mamba-2 diffusion architecture.
arXiv Detail & Related papers (2024-08-01T08:22:47Z) - Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba [11.400397931501338]
Event cameras efficiently detect changes in ambient light with low latency and high dynamic range while consuming minimal power.
Most current approach to processing event data often involves converting it into frame-based representations.
Point Cloud is a popular representation for 3D processing and is better suited to match the sparse and asynchronous nature of the event camera.
We propose EventMamba, an efficient and effective Point Cloud framework that achieves competitive results even compared to the state-of-the-art (SOTA) frame-based method.
arXiv Detail & Related papers (2024-05-09T21:47:46Z) - MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking [50.26836546224782]
Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy.
The diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization.
This paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information.
arXiv Detail & Related papers (2024-04-18T11:09:25Z) - MambaIR: A Simple Baseline for Image Restoration with State-Space Model [46.827053426281715]
We introduce MambaIR, which introduces both local enhancement and channel attention to improve the vanilla Mamba.
Our method outperforms SwinIR by up to 0.45dB on image SR, using similar computational cost but with a global receptive field.
arXiv Detail & Related papers (2024-02-23T23:15:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.