SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition
- URL: http://arxiv.org/abs/2410.16746v1
- Date: Tue, 22 Oct 2024 07:00:43 GMT
- Title: SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition
- Authors: Jiaqi Chen, Yan Yang, Shizhuo Deng, Da Teng, Liyuan Pan,
- Abstract summary: Human action recognition (HAR) plays a key role in various applications such as video analysis, surveillance, autonomous driving, robotics, and healthcare.
Most HAR algorithms are developed from RGB images, which capture detailed visual information.
Event cameras offer a promising solution by capturing scene brightness changes sparsely at the pixel level, without capturing full images.
- Score: 13.426390494116776
- License:
- Abstract: Human action recognition (HAR) plays a key role in various applications such as video analysis, surveillance, autonomous driving, robotics, and healthcare. Most HAR algorithms are developed from RGB images, which capture detailed visual information. However, these algorithms raise concerns in privacy-sensitive environments due to the recording of identifiable features. Event cameras offer a promising solution by capturing scene brightness changes sparsely at the pixel level, without capturing full images. Moreover, event cameras have high dynamic ranges that can effectively handle scenarios with complex lighting conditions, such as low light or high contrast environments. However, using event cameras introduces challenges in modeling the spatially sparse and high temporal resolution event data for HAR. To address these issues, we propose the SpikMamba framework, which combines the energy efficiency of spiking neural networks and the long sequence modeling capability of Mamba to efficiently capture global features from spatially sparse and high a temporal resolution event data. Additionally, to improve the locality of modeling, a spiking window-based linear attention mechanism is used. Extensive experiments show that SpikMamba achieves remarkable recognition performance, surpassing the previous state-of-the-art by 1.45%, 7.22%, 0.15%, and 3.92% on the PAF, HARDVS, DVS128, and E-FAction datasets, respectively. The code is available at https://github.com/Typistchen/SpikMamba.
Related papers
- EvenNICER-SLAM: Event-based Neural Implicit Encoding SLAM [69.83383687049994]
We propose EvenNICER-SLAM, a novel approach to dense visual simultaneous localization and mapping.
EvenNICER-SLAM incorporates event cameras that respond to intensity changes instead of absolute brightness.
Our results suggest the potential for event cameras to improve the robustness of dense SLAM systems against fast camera motion in real-world scenarios.
arXiv Detail & Related papers (2024-10-04T13:52:01Z) - Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms [29.577583619354314]
We propose a large-scale, high-definition ($1280 times 800$) human action recognition dataset based on the CeleX-V event camera.
To build a more comprehensive benchmark dataset, we report over 20 mainstream HAR models for future works to compare.
arXiv Detail & Related papers (2024-08-19T07:52:20Z) - Uncertainty-aware Bridge based Mobile-Former Network for Event-based Pattern Recognition [6.876867023375713]
Event cameras perform better on high dynamic range, no motion blur, and low energy consumption.
We propose a lightweight uncertainty-aware information propagation based Mobile-Former network for efficient pattern recognition.
arXiv Detail & Related papers (2024-01-20T05:26:28Z) - Implicit Event-RGBD Neural SLAM [54.74363487009845]
Implicit neural SLAM has achieved remarkable progress recently.
Existing methods face significant challenges in non-ideal scenarios.
We propose EN-SLAM, the first event-RGBD implicit neural SLAM framework.
arXiv Detail & Related papers (2023-11-18T08:48:58Z) - Deformable Neural Radiance Fields using RGB and Event Cameras [65.40527279809474]
We develop a novel method to model the deformable neural radiance fields using RGB and event cameras.
The proposed method uses the asynchronous stream of events and sparse RGB frames.
Experiments conducted on both realistically rendered graphics and real-world datasets demonstrate a significant benefit of the proposed method.
arXiv Detail & Related papers (2023-09-15T14:19:36Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - Event-based Simultaneous Localization and Mapping: A Comprehensive Survey [52.73728442921428]
Review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks.
Paper categorizes event-based vSLAM methods into four main categories: feature-based, direct, motion-compensation, and deep learning methods.
arXiv Detail & Related papers (2023-04-19T16:21:14Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - HARDVS: Revisiting Human Activity Recognition with Dynamic Vision
Sensors [40.949347728083474]
The main streams of human activity recognition (HAR) algorithms are developed based on RGB cameras which are suffered from illumination, fast motion, privacy-preserving, and large energy consumption.
Meanwhile, the biologically inspired event cameras attracted great interest due to their unique features, such as high dynamic range, dense temporal but sparse spatial resolution, low latency, low power, etc.
As it is a newly arising sensor, even there is no realistic large-scale dataset for HAR.
We propose a large-scale benchmark dataset, termed HARDVS, which contains 300 categories and more than 100K event sequences.
arXiv Detail & Related papers (2022-11-17T16:48:50Z) - Learning to Detect Objects with a 1 Megapixel Event Camera [14.949946376335305]
Event cameras encode visual information with high temporal precision, low data-rate, and high-dynamic range.
Due to the novelty of the field, the performance of event-based systems on many vision tasks is still lower compared to conventional frame-based solutions.
arXiv Detail & Related papers (2020-09-28T16:03:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.