Related papers: PASS: Path-selective State Space Model for Event-based Recognition

PASS: Path-selective State Space Model for Event-based Recognition

URL: http://arxiv.org/abs/2409.16953v2
Date: Sun, 21 Sep 2025 13:23:12 GMT
Title: PASS: Path-selective State Space Model for Event-based Recognition
Authors: Jiazhou Zhou, Kanghao Chen, Lei Zhang, Lin Wang,
Abstract summary: Event cameras are bio-inspired sensors with advantages, such as high temporal resolution.<n>We present our PASS framework, exhibiting superior capacity for high event modeling.<n>Our key insight is to learn adaptively encoded event features via the state space models.
Score: 12.651829415097758
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Event cameras are bio-inspired sensors that capture intensity changes asynchronously with distinct advantages, such as high temporal resolution. Existing methods for event-based object/action recognition predominantly sample and convert event representation at every fixed temporal interval (or frequency). However, they are constrained to processing a limited number of event lengths and show poor frequency generalization, thus not fully leveraging the event's high temporal resolution. In this paper, we present our PASS framework, exhibiting superior capacity for spatiotemporal event modeling towards a larger number of event lengths and generalization across varying inference temporal frequencies. Our key insight is to learn adaptively encoded event features via the state space models (SSMs), whose linear complexity and generalization on input frequency make them ideal for processing high temporal resolution events. Specifically, we propose a Path-selective Event Aggregation and Scan (PEAS) module to encode events into features with fixed dimensions by adaptively scanning and selecting aggregated event presentations. On top of it, we introduce a novel Multi-faceted Selection Guiding (MSG) loss to minimize the randomness and redundancy of the encoded features during the PEAS selection process. Our method outperforms prior methods on five public datasets and shows strong generalization across varying inference frequencies with less accuracy drop (ours -8.62% vs. -20.69% for the baseline). Overall, PASS exhibits strong long spatiotemporal modeling for a broader distribution of event length (1-10^9), precise temporal perception, and generalization for real-world

Related papers

Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking [51.31378940976401]
Existing RGB-Event tracking approaches fail to fully exploit the unique advantages of event cameras.<n>We propose a novel tracking framework that performs early fusion in the frequency domain, enabling effective aggregation of high-frequency information from the event modality.<n>Experiments on three widely used RGB-Event tracking benchmark datasets, including FE108, FELT, and COESOT, demonstrate the high performance and efficiency of our method.
arXiv Detail & Related papers (2026-01-03T01:10:17Z)
EventSTU: Event-Guided Efficient Spatio-Temporal Understanding for Video Large Language Models [56.16721798968254]
We propose an event-guided, training-free framework for efficient understanding, named EventSTU.<n>In the temporal domain, we design a coarse-to-fine sampling algorithm that the change-triggered property of event cameras to eliminate redundant large frames.<n>In the spatial domain, we achieves an adaptive token pruning algorithm that leverages the saliency of events as a zero-cost prior to guide spatial reduction.
arXiv Detail & Related papers (2025-11-24T09:30:02Z)
Event-based Facial Keypoint Alignment via Cross-Modal Fusion Attention and Self-Supervised Multi-Event Representation Learning [16.170645576584487]
Event cameras offer unique advantages for facial keypoint alignment under challenging conditions.<n>We propose a novel framework based on cross-modal fusion attention (CMFA) and self-supervised multi-event representation learning (SSMER) for event-based facial keypoint alignment.
arXiv Detail & Related papers (2025-09-29T16:00:50Z)
LET-US: Long Event-Text Understanding of Scenes [23.376693904132786]
Event cameras output event streams as sparse, asynchronous data with microsecond-level temporal resolution.<n>We introduce LET-US, a framework for long event-stream--text comprehension.<n>We use an adaptive compression mechanism to reduce the volume of input events while preserving critical visual details.
arXiv Detail & Related papers (2025-08-10T16:02:41Z)
Hybrid Spiking Vision Transformer for Object Detection with Event Cameras [19.967565219584056]
Spiking Neural Networks (SNNs) have emerged as a promising approach, offering low energy consumption and rich dynamics.<n>This study proposes a novel hybrid Transformer (HsVT) model to enhance the performance of event-based object detection.<n> Experimental results demonstrate that HsVT achieves significant performance improvements in event detection with fewer parameters.
arXiv Detail & Related papers (2025-05-12T16:19:20Z)
Event Signal Filtering via Probability Flux Estimation [58.31652473933809]
Events offer a novel paradigm for capturing scene dynamics via asynchronous sensing, but their inherent randomness often leads to degraded signal quality. Event signal filtering is thus essential for enhancing fidelity by reducing this internal randomness and ensuring consistent outputs across diverse acquisition conditions. This paper introduces a generative, online filtering framework called Event Density Flow Filter (EDFilter) Experiments validate EDFilter's performance across tasks like event filtering, super-resolution, and direct event-based blob tracking.
arXiv Detail & Related papers (2025-04-10T07:03:08Z)
Inter-event Interval Microscopy for Event Cameras [52.05337480169517]
Event cameras, an innovative bio-inspired sensor, differ from traditional cameras by sensing changes in intensity rather than directly perceiving intensity. We achieve event-to-intensity conversion using a static event camera for both static and dynamic scenes in fluorescence microscopy. We have collected IEIMat dataset under various scenes including high dynamic range and high-speed scenarios.
arXiv Detail & Related papers (2025-04-07T11:05:13Z)
FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies [45.82637829492951]
Event cameras offer unparalleled advantages for real-time perception in dynamic environments.<n> Existing event detectors are limited by fixed-frequency paradigms.<n>We propose FlexEvent, a novel framework that enables detection at varying frequencies.
arXiv Detail & Related papers (2024-12-09T17:57:14Z)
HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera [22.208120663778043]
Continuous space-time super-resolution (C-STVSR) aims to simultaneously enhance resolution and frame rate at an arbitrary scale. We propose a novel C-STVSR framework, called HR-INR, which captures both holistic dependencies and regional motions based on implicit neural representation (INR) We then propose a novel INR-based decoder withtemporal embeddings to capture long-term dependencies with a larger temporal perception field.
arXiv Detail & Related papers (2024-05-22T06:51:32Z)
Scalable Event-by-event Processing of Neuromorphic Sensory Signals With Deep State-Space Models [2.551844666707809]
Event-based sensors are well suited for real-time processing. Current methods either collapse events into frames or cannot scale up when processing the event data directly event-by-event.
arXiv Detail & Related papers (2024-04-29T08:50:27Z)
MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking [50.26836546224782]
Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy. The diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization. This paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information.
arXiv Detail & Related papers (2024-04-18T11:09:25Z)
TSLANet: Rethinking Transformers for Time Series Representation Learning [19.795353886621715]
Time series data is characterized by its intrinsic long and short-range dependencies. We introduce a novel Time Series Lightweight Network (TSLANet) as a universal convolutional model for diverse time series tasks. Our experiments demonstrate that TSLANet outperforms state-of-the-art models in various tasks spanning classification, forecasting, and anomaly detection.
arXiv Detail & Related papers (2024-04-12T13:41:29Z)
XTSFormer: Cross-Temporal-Scale Transformer for Irregular Time Event Prediction [9.240950990926796]
Event prediction aims to forecast the time and type of a future event based on a historical event sequence. Despite its significance, several challenges exist, including the irregularity of time intervals between consecutive events, the existence of cycles, periodicity, and multi-scale event interactions.
arXiv Detail & Related papers (2024-02-03T20:33:39Z)
Implicit Event-RGBD Neural SLAM [54.74363487009845]
Implicit neural SLAM has achieved remarkable progress recently. Existing methods face significant challenges in non-ideal scenarios. We propose EN-SLAM, the first event-RGBD implicit neural SLAM framework.
arXiv Detail & Related papers (2023-11-18T08:48:58Z)
EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields [80.94515892378053]
EvDNeRF is a pipeline for generating event data and training an event-based dynamic NeRF. NeRFs offer geometric-based learnable rendering, but prior work with events has only considered reconstruction of static scenes. We show that by training on varied batch sizes of events, we can improve test-time predictions of events at fine time resolutions.
arXiv Detail & Related papers (2023-10-03T21:08:41Z)
V2CE: Video to Continuous Events Simulator [1.1009908861287052]
We present a novel method for video-to-events stream conversion from multiple perspectives, considering the specific characteristics of Dynamic Vision Sensor (DVS) A series of carefully designed timestamp losses helps enhance the quality of generated event voxels significantly. We also propose a novel local dynamic-aware inference strategy to accurately recover event timestamps from event voxels in a continuous fashion.
arXiv Detail & Related papers (2023-09-16T06:06:53Z)
Event-based Stereo Visual Odometry with Native Temporal Resolution via Continuous-time Gaussian Process Regression [3.4447129363520332]
Event-based cameras capture individual visual changes in a scene at unique times. It is often addressed in visual odometry pipelines by approximating temporally close measurements as occurring at one common time. This paper presents a complete stereo VO pipeline that estimates directly with individual event-measurement times without requiring any grouping or approximation in the estimation state.
arXiv Detail & Related papers (2023-06-01T22:57:32Z)
Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events [63.984927609545856]
Event-based Inter/intra-frame Compensator (E-IC) is proposed to predict the per-pixel dynamic between arbitrary time intervals. We show that the proposed method achieves state-of-the-art and shows remarkable performance for event-based RS2GS inversion in real-world scenarios.
arXiv Detail & Related papers (2023-04-14T05:30:02Z)
Dual Memory Aggregation Network for Event-Based Object Detection with Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner. Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation. Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z)
Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement [51.55157852647306]
Time series forecasting has been a widely explored task of great importance in many applications. It is common that real-world time series data are recorded in a short time period, which results in a big gap between the deep model and the limited and noisy time series. We propose to address the time series forecasting problem with generative modeling and propose a bidirectional variational auto-encoder equipped with diffusion, denoise, and disentanglement.
arXiv Detail & Related papers (2023-01-08T12:20:46Z)
Event Transformer [43.193463048148374]
Event camera's low power consumption and ability to capture microsecond brightness make it attractive for various computer vision tasks. Existing event representation methods typically convert events into frames, voxel grids, or spikes for deep neural networks (DNNs) This work introduces a novel token-based event representation, where each event is considered a fundamental processing unit termed an event-token.
arXiv Detail & Related papers (2022-04-11T15:05:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.