Related papers: Temporal-Guided Spiking Neural Networks for Event-Based Human Action Recognition

Temporal-Guided Spiking Neural Networks for Event-Based Human Action Recognition

URL: http://arxiv.org/abs/2503.17132v2
Date: Thu, 27 Mar 2025 11:35:37 GMT
Title: Temporal-Guided Spiking Neural Networks for Event-Based Human Action Recognition
Authors: Siyuan Yang, Shilin Lu, Shizheng Wang, Meng Hwa Er, Zengwei Zheng, Alex C. Kot,
Abstract summary: This paper explores the promising interplay between neural networks (SNNs) and event-based cameras for privacy-preserving human action recognition (HAR)<n>We introduce two novel frameworks to address this: temporal segment-based SNN (textitTS-SNN) and 3D convolutional SNN (textit3D-SNN)<n>To promote further research in event-based HAR, we create a dataset, textitFallingDetection-CeleX, collected using the high-resolution CeleX-V event camera.
Score: 31.528007074074043
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper explores the promising interplay between spiking neural networks (SNNs) and event-based cameras for privacy-preserving human action recognition (HAR). The unique feature of event cameras in capturing only the outlines of motion, combined with SNNs' proficiency in processing spatiotemporal data through spikes, establishes a highly synergistic compatibility for event-based HAR. Previous studies, however, have been limited by SNNs' ability to process long-term temporal information, essential for precise HAR. In this paper, we introduce two novel frameworks to address this: temporal segment-based SNN (\textit{TS-SNN}) and 3D convolutional SNN (\textit{3D-SNN}). The \textit{TS-SNN} extracts long-term temporal information by dividing actions into shorter segments, while the \textit{3D-SNN} replaces 2D spatial elements with 3D components to facilitate the transmission of temporal information. To promote further research in event-based HAR, we create a dataset, \textit{FallingDetection-CeleX}, collected using the high-resolution CeleX-V event camera $(1280 \times 800)$, comprising 7 distinct actions. Extensive experimental results show that our proposed frameworks surpass state-of-the-art SNN methods on our newly collected dataset and three other neuromorphic datasets, showcasing their effectiveness in handling long-range temporal information for event-based HAR.

Related papers

Enhanced Temporal Processing in Spiking Neural Networks for Static Object Detection Using 3D Convolutions [0.0]
Spiking Neural Networks (SNNs) are a class of network models capable of processingtemporal information.<n>This paper focuses on enhancing the SNNs unique ability to processtemporal information.<n>To improve the SNN handling of temporal information, this paper proposes replacing traditional 2D convolutions with 3D convolutions.
arXiv Detail & Related papers (2024-12-23T15:32:26Z)
Enhancing SNN-based Spatio-Temporal Learning: A Benchmark Dataset and Cross-Modality Attention Model [30.66645039322337]
High-quality benchmark datasets are great importance to the advances of Artificial Neural Networks (SNNs) Yet, the SNN-based cross-modal fusion remains underexplored. In this work, we present a neuromorphic dataset that can better exploit the inherent-temporal betemporal of SNNs.
arXiv Detail & Related papers (2024-10-21T06:59:04Z)
Towards Low-latency Event-based Visual Recognition with Hybrid Step-wise Distillation Spiking Neural Networks [50.32980443749865]
Spiking neural networks (SNNs) have garnered significant attention for their low power consumption and high biologicalability. Current SNNs struggle to balance accuracy and latency in neuromorphic datasets. We propose Step-wise Distillation (HSD) method, tailored for neuromorphic datasets.
arXiv Detail & Related papers (2024-09-19T06:52:34Z)
SFOD: Spiking Fusion Object Detector [10.888008544975662]
Spiking Fusion Object Detector (SFOD) is a simple and efficient approach to SNN-based object detection. We design a Spiking Fusion Module, achieving the first-time fusion of feature maps from different scales in SNNs applied to event cameras. We establish state-of-the-art classification results based on SNNs, achieving 93.7% accuracy on the NCAR dataset.
arXiv Detail & Related papers (2024-03-22T13:24:50Z)
Efficient and Effective Time-Series Forecasting with Spiking Neural Networks [47.371024581669516]
Spiking neural networks (SNNs) provide a unique pathway for capturing the intricacies of temporal data. Applying SNNs to time-series forecasting is challenging due to difficulties in effective temporal alignment, complexities in encoding processes, and the absence of standardized guidelines for model selection. We propose a framework for SNNs in time-series forecasting tasks, leveraging the efficiency of spiking neurons in processing temporal information.
arXiv Detail & Related papers (2024-02-02T16:23:50Z)
Event-based Human Pose Tracking by Spiking Spatiotemporal Transformer [20.188995900488717]
We present a dedicated end-to-end sparse deep approach for event-based pose tracking. This is the first time that 3D human pose tracking is obtained from events only. Our approach also achieves a significant reduction of 80% in FLOPS.
arXiv Detail & Related papers (2023-03-16T22:56:12Z)
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query. Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information. We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z)
Hybrid SNN-ANN: Energy-Efficient Classification and Object Detection for Event-Based Vision [64.71260357476602]
Event-based vision sensors encode local pixel-wise brightness changes in streams of events rather than image frames. Recent progress in object recognition from event-based sensors has come from conversions of deep neural networks. We propose a hybrid architecture for end-to-end training of deep neural networks for event-based pattern recognition and object detection.
arXiv Detail & Related papers (2021-12-06T23:45:58Z)
SpikeMS: Deep Spiking Neural Network for Motion Segmentation [7.491944503744111]
textitSpikeMS is the first deep encoder-decoder SNN architecture for the real-world large-scale problem of motion segmentation. We show that textitSpikeMS is capable of textitincremental predictions, or predictions from smaller amounts of test data than it is trained on.
arXiv Detail & Related papers (2021-05-13T21:34:55Z)
4D Spatio-Temporal Convolutional Networks for Object Position Estimation in OCT Volumes [69.62333053044712]
3D convolutional neural networks (CNNs) have shown promising performance for pose estimation of a marker object using single OCT images. We extend 3D CNNs to 4D-temporal CNNs to evaluate the impact of additional temporal information for marker object tracking.
arXiv Detail & Related papers (2020-07-02T12:02:20Z)
Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes. The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
Event-Based Angular Velocity Regression with Spiking Networks [51.145071093099396]
Spiking Neural Networks (SNNs) process information conveyed as temporal spikes rather than numeric values. We propose, for the first time, a temporal regression problem of numerical values given events from an event camera. We show that we can successfully train an SNN to perform angular velocity regression.
arXiv Detail & Related papers (2020-03-05T17:37:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.