Path-adaptive Spatio-Temporal State Space Model for Event-based Recognition with Arbitrary Duration
- URL: http://arxiv.org/abs/2409.16953v1
- Date: Wed, 25 Sep 2024 14:08:37 GMT
- Title: Path-adaptive Spatio-Temporal State Space Model for Event-based Recognition with Arbitrary Duration
- Authors: Jiazhou Zhou, Kanghao Chen, Lei Zhang, Lin Wang,
- Abstract summary: Event cameras are bio-inspired sensors that capture the intensity changes asynchronously and output event streams.
We present a novel framework, dubbed PAST-Act, exhibiting superior capacity in recognizing events with arbitrary duration.
We also build a minute-level event-based recognition dataset, named ArDVS100, with arbitrary duration for the benefit of the community.
- Score: 9.547947845734992
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Event cameras are bio-inspired sensors that capture the intensity changes asynchronously and output event streams with distinct advantages, such as high temporal resolution. To exploit event cameras for object/action recognition, existing methods predominantly sample and aggregate events in a second-level duration at every fixed temporal interval (or frequency). However, they often face difficulties in capturing the spatiotemporal relationships for longer, e.g., minute-level, events and generalizing across varying temporal frequencies. To fill the gap, we present a novel framework, dubbed PAST-SSM, exhibiting superior capacity in recognizing events with arbitrary duration (e.g., 0.1s to 4.5s) and generalizing to varying inference frequencies. Our key insight is to learn the spatiotemporal relationships from the encoded event features via the state space model (SSM) -- whose linear complexity makes it ideal for modeling high temporal resolution events with longer sequences. To achieve this goal, we first propose a Path-Adaptive Event Aggregation and Scan (PEAS) module to encode events of varying duration into features with fixed dimensions by adaptively scanning and selecting aggregated event frames. On top of PEAS, we introduce a novel Multi-faceted Selection Guiding (MSG) loss to minimize the randomness and redundancy of the encoded features. This subtly enhances the model generalization across different inference frequencies. Lastly, the SSM is employed to better learn the spatiotemporal properties from the encoded features. Moreover, we build a minute-level event-based recognition dataset, named ArDVS100, with arbitrary duration for the benefit of the community. Extensive experiments prove that our method outperforms prior arts by +3.45%, +0.38% and +8.31% on the DVS Action, SeAct and HARDVS datasets, respectively.
Related papers
- HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera [22.208120663778043]
Continuous space-time super-resolution (C-STVSR) aims to simultaneously enhance resolution and frame rate at an arbitrary scale.
We propose a novel C-STVSR framework, called HR-INR, which captures both holistic dependencies and regional motions based on implicit neural representation (INR)
We then propose a novel INR-based decoder withtemporal embeddings to capture long-term dependencies with a larger temporal perception field.
arXiv Detail & Related papers (2024-05-22T06:51:32Z) - Scalable Event-by-event Processing of Neuromorphic Sensory Signals With Deep State-Space Models [2.551844666707809]
Event-based sensors are well suited for real-time processing.
Current methods either collapse events into frames or cannot scale up when processing the event data directly event-by-event.
arXiv Detail & Related papers (2024-04-29T08:50:27Z) - MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking [50.26836546224782]
Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy.
The diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization.
This paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information.
arXiv Detail & Related papers (2024-04-18T11:09:25Z) - XTSFormer: Cross-Temporal-Scale Transformer for Irregular Time Event
Prediction [9.240950990926796]
Event prediction aims to forecast the time and type of a future event based on a historical event sequence.
Despite its significance, several challenges exist, including the irregularity of time intervals between consecutive events, the existence of cycles, periodicity, and multi-scale event interactions.
arXiv Detail & Related papers (2024-02-03T20:33:39Z) - Implicit Event-RGBD Neural SLAM [54.74363487009845]
Implicit neural SLAM has achieved remarkable progress recently.
Existing methods face significant challenges in non-ideal scenarios.
We propose EN-SLAM, the first event-RGBD implicit neural SLAM framework.
arXiv Detail & Related papers (2023-11-18T08:48:58Z) - EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields [80.94515892378053]
EvDNeRF is a pipeline for generating event data and training an event-based dynamic NeRF.
NeRFs offer geometric-based learnable rendering, but prior work with events has only considered reconstruction of static scenes.
We show that by training on varied batch sizes of events, we can improve test-time predictions of events at fine time resolutions.
arXiv Detail & Related papers (2023-10-03T21:08:41Z) - V2CE: Video to Continuous Events Simulator [1.1009908861287052]
We present a novel method for video-to-events stream conversion from multiple perspectives, considering the specific characteristics of Dynamic Vision Sensor (DVS)
A series of carefully designed timestamp losses helps enhance the quality of generated event voxels significantly.
We also propose a novel local dynamic-aware inference strategy to accurately recover event timestamps from event voxels in a continuous fashion.
arXiv Detail & Related papers (2023-09-16T06:06:53Z) - Event-based Stereo Visual Odometry with Native Temporal Resolution via
Continuous-time Gaussian Process Regression [3.4447129363520332]
Event-based cameras capture individual visual changes in a scene at unique times.
It is often addressed in visual odometry pipelines by approximating temporally close measurements as occurring at one common time.
This paper presents a complete stereo VO pipeline that estimates directly with individual event-measurement times without requiring any grouping or approximation in the estimation state.
arXiv Detail & Related papers (2023-06-01T22:57:32Z) - Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and
Events [63.984927609545856]
Event-based Inter/intra-frame Compensator (E-IC) is proposed to predict the per-pixel dynamic between arbitrary time intervals.
We show that the proposed method achieves state-of-the-art and shows remarkable performance for event-based RS2GS inversion in real-world scenarios.
arXiv Detail & Related papers (2023-04-14T05:30:02Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - Generative Time Series Forecasting with Diffusion, Denoise, and
Disentanglement [51.55157852647306]
Time series forecasting has been a widely explored task of great importance in many applications.
It is common that real-world time series data are recorded in a short time period, which results in a big gap between the deep model and the limited and noisy time series.
We propose to address the time series forecasting problem with generative modeling and propose a bidirectional variational auto-encoder equipped with diffusion, denoise, and disentanglement.
arXiv Detail & Related papers (2023-01-08T12:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.