EDmamba: Rethinking Efficient Event Denoising with Spatiotemporal Decoupled SSMs
- URL: http://arxiv.org/abs/2505.05391v3
- Date: Sun, 14 Sep 2025 05:17:05 GMT
- Title: EDmamba: Rethinking Efficient Event Denoising with Spatiotemporal Decoupled SSMs
- Authors: Ciyu Ruan, Zihang Gong, Ruishan Guo, Jingao Xu, Xinlei Chen,
- Abstract summary: Event cameras provide micro-second latency and broad dynamic range, yet their raw streams are marred by spatial artifacts.<n>We introduce EDmamba, a compact event-denoising framework that embraces the key insight that spatial and temporal noise arise from different physical mechanisms.<n>This decoupled design distills the network to only 88.9K parameters and 2.27GPs, enabling realtime throughput of 100K events in 68ms on a single Transformer.
- Score: 23.63023704154084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras provide micro-second latency and broad dynamic range, yet their raw streams are marred by spatial artifacts (e.g., hot pixels) and temporally inconsistent background activity. Existing methods jointly process the entire 4D event volume (x, y, p, t), forcing heavy spatio-temporal attention that inflates parameters, FLOPs, and latency. We introduce EDmamba, a compact event-denoising framework that embraces the key insight that spatial and temporal noise arise from different physical mechanisms and can therefore be suppressed independently. A polarity- and geometry-aware encoder first extracts coarse cues, which are then routed to two lightweight state-space branches: a Spatial-SSM that learns location-conditioned filters to silence persistent artifacts, and a Temporal-SSM that models causal signal dynamics to eliminate bursty background events. This decoupled design distills the network to only 88.9K parameters and 2.27GFLOPs, enabling real-time throughput of 100K events in 68ms on a single GPU, 36x faster than recent Transformer baselines. Despite its economy, EDmamba establishes new state-of-the-art accuracy on four public benchmarks, outscoring the strongest prior model by 2.1 percentage points.
Related papers
- FoSS: Modeling Long Range Dependencies and Multimodal Uncertainty in Trajectory Prediction via Fourier State Space Integration [21.39395366378851]
We present FoSS, a dual-branch framework that unifies frequency-domain reasoning with linear-time sequence modeling.<n>Experiments on Argoverse 1 and Argoverse 2 benchmarks demonstrate that FoSS achieves state-of-the-art accuracy while reducing computation by 22.5% and parameters by over 40%.
arXiv Detail & Related papers (2026-03-01T21:38:59Z) - BabyMamba-HAR: Lightweight Selective State Space Models for Efficient Human Activity Recognition on Resource Constrained Devices [0.0]
Human activity recognition (HAR) on wearable and mobile devices is constrained by memory footprint and computational budget.<n> Selective state space models (SSMs) offer linear time sequence processing with input dependent gating.<n>BabyMamba-HAR is introduced, a framework comprising two novel lightweight Mamba inspired architectures for resource constrained HAR.
arXiv Detail & Related papers (2026-02-10T15:16:32Z) - PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo Matching [51.98089287914147]
textbfPick-and-textbflay textbfMemory (PM) construction module for dynamic bfStereo matching, dubbed as bftextPPMStereo.<n>Inspired by the two-stage decision-making process in humans, we propose a textbfPick-and-textbflay textbfMemory (PM) construction module for dynamic bfStereo matching, dubbed as bftextPPMStereo.
arXiv Detail & Related papers (2025-10-23T03:52:39Z) - HAD: Hierarchical Asymmetric Distillation to Bridge Spatio-Temporal Gaps in Event-Based Object Tracking [80.07224739976911]
Event cameras offer exceptional temporal resolution and a range (modal)<n> RGB cameras excel at capturing rich texture with high resolution, whereas event cameras offer exceptional temporal resolution and a range (modal)
arXiv Detail & Related papers (2025-10-22T13:15:13Z) - EGTM: Event-guided Efficient Turbulence Mitigation [19.09752432962073]
Turbulence mitigation (TM) aims to remove the distortions and blurs introduced by atmospheric turbulence into frame cameras.<n>We present a novel EGTM framework that extracts pixel-level reliable turbulence-free guidance from noisy turbulent events for temporal lucky fusion.<n>We build the first turbulence data acquisition system to contribute the first real-world event-driven TM dataset.
arXiv Detail & Related papers (2025-09-04T01:49:13Z) - Inference-Time Gaze Refinement for Micro-Expression Recognition: Enhancing Event-Based Eye Tracking with Motion-Aware Post-Processing [2.5465367830324905]
Event-based eye tracking holds significant promise for fine-grained cognitive state inference.<n>We introduce a model-agnostic, inference-time refinement framework to enhance the output of existing event-based gaze estimation models.
arXiv Detail & Related papers (2025-06-14T14:48:11Z) - PRE-Mamba: A 4D State Space Model for Ultra-High-Frequent Event Camera Deraining [47.81253972389206]
Event cameras excel in high temporal resolution and dynamic range but suffer from dense noise in rainy conditions.<n>We propose PRE-Mamba, a novel point-based camera framework for event deraining.
arXiv Detail & Related papers (2025-05-08T14:52:45Z) - Simultaneous Motion And Noise Estimation with Event Cameras [18.2247510082534]
Event cameras are emerging vision sensors whose noise is challenging to characterize.<n>Existing denoising methods for event cameras are often designed in isolation.<n>We propose, to the best of our knowledge, the first method that simultaneously estimates motion in its various forms.
arXiv Detail & Related papers (2025-04-05T02:47:40Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera [17.61884467264023]
We propose a novel network architecture specifically designed for event data processing.<n>We establish the first large-scale dataset for egocentric gesture recognition using event cameras.<n>Our method achieves 62.7% accuracy tested on unseen subjects with only 7M parameters, 3.1% higher than state-of-the-art approaches.
arXiv Detail & Related papers (2025-03-16T09:08:02Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - STNMamba: Mamba-based Spatial-Temporal Normality Learning for Video Anomaly Detection [48.997518615379995]
Video anomaly detection (VAD) has been extensively researched due to its potential for intelligent video systems.<n>Most existing methods based on CNNs and transformers still suffer from substantial computational burdens.<n>We propose a lightweight and effective Mamba-based network named STNMamba to enhance the learning of spatial-temporal normality.
arXiv Detail & Related papers (2024-12-28T08:49:23Z) - Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry [7.517597541959445]
We introduce S-Temporal Visual Odometry (STVO), a novel deep network architecture to enhance accuracy and consistency of multi-frame flow matching.<n>Our STVO achieves state-the-art performance on ETH3D benchmark and 38.9% on KITTI Odometry benchmark over the previous best methods.
arXiv Detail & Related papers (2024-12-22T08:47:13Z) - Event-Based Tracking Any Point with Motion-Augmented Temporal Consistency [58.719310295870024]
This paper presents an event-based framework for tracking any point.<n>It tackles the challenges posed by spatial sparsity and motion sensitivity in events.<n>It achieves 150% faster processing with competitive model parameters.
arXiv Detail & Related papers (2024-12-02T09:13:29Z) - Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory [87.62730694973696]
STEEL is the first provably sample-efficient algorithm for learning the controllable dynamics of an Exogenous Block Markov Decision Process from a single trajectory.<n>We prove that STEEL is correct and sample-efficient, and demonstrate STEEL on two toy problems.
arXiv Detail & Related papers (2024-10-03T21:57:21Z) - LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising [19.51468512911655]
Event camera has significant advantages in capturing dynamic scene information while being prone to noise interference.
We construct a new paired real-world event denoising dataset (LED), including 3K sequences with 18K seconds of high-resolution (1200*680) event streams.
We propose a novel effective denoising framework(DED) using homogeneous dual events to generate the GT with better separating noise from the raw.
arXiv Detail & Related papers (2024-05-30T06:02:35Z) - Fast Window-Based Event Denoising with Spatiotemporal Correlation
Enhancement [85.66867277156089]
We propose window-based event denoising, which simultaneously deals with a stack of events.
In spatial domain, we choose maximum a posteriori (MAP) to discriminate real-world event and noise.
Our algorithm can remove event noise effectively and efficiently and improve the performance of downstream tasks.
arXiv Detail & Related papers (2024-02-14T15:56:42Z) - The Missing U for Efficient Diffusion Models [3.712196074875643]
Diffusion Probabilistic Models yield record-breaking performance in tasks such as image synthesis, video generation, and molecule design.
Despite their capabilities, their efficiency, especially in the reverse process, remains a challenge due to slow convergence rates and high computational costs.
We introduce an approach that leverages continuous dynamical systems to design a novel denoising network for diffusion models.
arXiv Detail & Related papers (2023-10-31T00:12:14Z) - Realistic Noise Synthesis with Diffusion Models [44.404059914652194]
Deep denoising models require extensive real-world training data, which is challenging to acquire.<n>We propose a novel Realistic Noise Synthesis Diffusor (RNSD) method using diffusion models to address these challenges.
arXiv Detail & Related papers (2023-05-23T12:56:01Z) - Advancing Unsupervised Low-light Image Enhancement: Noise Estimation, Illumination Interpolation, and Self-Regulation [55.07472635587852]
Low-Light Image Enhancement (LLIE) techniques have made notable advancements in preserving image details and enhancing contrast.
These approaches encounter persistent challenges in efficiently mitigating dynamic noise and accommodating diverse low-light scenarios.
We first propose a method for estimating the noise level in low light images in a quick and accurate way.
We then devise a Learnable Illumination Interpolator (LII) to satisfy general constraints between illumination and input.
arXiv Detail & Related papers (2023-05-17T13:56:48Z) - ProgressiveMotionSeg: Mutually Reinforced Framework for Event-Based
Motion Segmentation [101.19290845597918]
This paper presents a Motion Estimation (ME) module and an Event Denoising (ED) module jointly optimized in a mutually reinforced manner.
Taking temporal correlation as guidance, ED module calculates the confidence that each event belongs to real activity events, and transmits it to ME module to update energy function of motion segmentation for noise suppression.
arXiv Detail & Related papers (2022-03-22T13:40:26Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.