SPACT18: Spiking Human Action Recognition Benchmark Dataset with Complementary RGB and Thermal Modalities
- URL: http://arxiv.org/abs/2507.16151v1
- Date: Tue, 22 Jul 2025 01:59:14 GMT
- Title: SPACT18: Spiking Human Action Recognition Benchmark Dataset with Complementary RGB and Thermal Modalities
- Authors: Yasser Ashraf, Ahmed Sharshar, Velibor Bojkovic, Bin Gu,
- Abstract summary: Spike cameras, bio-inspired vision sensors, asynchronously fire by accumulating light intensities at each pixel, offering exceptional resolution spikes.<n>This work contributes a dataset that will drive research in energy-efficient, ultra-low-power video understanding, specifically for action recognition using spike-based data.
- Score: 14.157338282165037
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spike cameras, bio-inspired vision sensors, asynchronously fire spikes by accumulating light intensities at each pixel, offering ultra-high energy efficiency and exceptional temporal resolution. Unlike event cameras, which record changes in light intensity to capture motion, spike cameras provide even finer spatiotemporal resolution and a more precise representation of continuous changes. In this paper, we introduce the first video action recognition (VAR) dataset using spike camera, alongside synchronized RGB and thermal modalities, to enable comprehensive benchmarking for Spiking Neural Networks (SNNs). By preserving the inherent sparsity and temporal precision of spiking data, our three datasets offer a unique platform for exploring multimodal video understanding and serve as a valuable resource for directly comparing spiking, thermal, and RGB modalities. This work contributes a novel dataset that will drive research in energy-efficient, ultra-low-power video understanding, specifically for action recognition tasks using spike-based data.
Related papers
- A Novel Tuning Method for Real-time Multiple-Object Tracking Utilizing Thermal Sensor with Complexity Motion Pattern [7.6016974897939535]
Multi-Object Tracking in thermal images is essential for surveillance systems.<n>The paper introduces a novel tuning method for pedestrian tracking, specifically designed to handle the complex motion patterns in thermal imagery.
arXiv Detail & Related papers (2025-07-03T08:03:35Z) - Human Activity Recognition using RGB-Event based Sensors: A Multi-modal Heat Conduction Model and A Benchmark Dataset [65.76480665062363]
Human Activity Recognition primarily relied on traditional RGB cameras to achieve high-performance activity recognition.<n>Challenges in real-world scenarios, such as insufficient lighting and rapid movements, inevitably degrade the performance of RGB cameras.<n>In this work, we rethink human activity recognition by combining the RGB and event cameras.
arXiv Detail & Related papers (2025-04-08T09:14:24Z) - Inter-event Interval Microscopy for Event Cameras [52.05337480169517]
Event cameras, an innovative bio-inspired sensor, differ from traditional cameras by sensing changes in intensity rather than directly perceiving intensity.<n>We achieve event-to-intensity conversion using a static event camera for both static and dynamic scenes in fluorescence microscopy.<n>We have collected IEIMat dataset under various scenes including high dynamic range and high-speed scenarios.
arXiv Detail & Related papers (2025-04-07T11:05:13Z) - Multi-modal Multi-platform Person Re-Identification: Benchmark and Method [58.59888754340054]
MP-ReID is a novel dataset designed specifically for multi-modality and multi-platform ReID.<n>This benchmark compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging.<n>We introduce Uni-Prompt ReID, a framework with specific-designed prompts, tailored for cross-modality and cross-platform scenarios.
arXiv Detail & Related papers (2025-03-21T12:27:49Z) - EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera [17.61884467264023]
We propose a novel network architecture specifically designed for event data processing.<n>We establish the first large-scale dataset for egocentric gesture recognition using event cameras.<n>Our method achieves 62.7% accuracy tested on unseen subjects with only 7M parameters, 3.1% higher than state-of-the-art approaches.
arXiv Detail & Related papers (2025-03-16T09:08:02Z) - SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition [13.426390494116776]
Human action recognition (HAR) plays a key role in various applications such as video analysis, surveillance, autonomous driving, robotics, and healthcare.
Most HAR algorithms are developed from RGB images, which capture detailed visual information.
Event cameras offer a promising solution by capturing scene brightness changes sparsely at the pixel level, without capturing full images.
arXiv Detail & Related papers (2024-10-22T07:00:43Z) - A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation [3.355813093377501]
Event cameras encode temporal changes in light intensity as asynchronous binary spikes.<n>Their unconventional spiking output and the scarcity of labelled datasets pose significant challenges to traditional image-based depth estimation methods.<n>We propose a novel energy-efficient Spike-Driven Transformer Network (SDT) for depth estimation, leveraging the unique properties of spiking data.
arXiv Detail & Related papers (2024-04-26T11:32:53Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera [8.673063170884591]
EOLO is a novel object detection framework that achieves robust and efficient all-day detection by fusing both RGB and event modalities.
Our EOLO framework is built based on a lightweight spiking neural network (SNN) to efficiently leverage the asynchronous property of events.
arXiv Detail & Related papers (2023-09-17T15:14:01Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - E$^2$(GO)MOTION: Motion Augmented Event Stream for Egocentric Action
Recognition [21.199869051111367]
Event cameras capture pixel-level intensity changes in the form of "events"
N-EPIC-Kitchens is the first event-based camera extension of the large-scale EPIC-Kitchens dataset.
We show that event data provides a comparable performance to RGB and optical flow, yet without any additional flow computation at deploy time.
arXiv Detail & Related papers (2021-12-07T09:43:08Z) - Combining Events and Frames using Recurrent Asynchronous Multimodal
Networks for Monocular Depth Prediction [51.072733683919246]
We introduce Recurrent Asynchronous Multimodal (RAM) networks to handle asynchronous and irregular data from multiple sensors.
Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction.
We show an improvement over state-of-the-art methods by up to 30% in terms of mean depth absolute error.
arXiv Detail & Related papers (2021-02-18T13:24:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.