HARDVS: Revisiting Human Activity Recognition with Dynamic Vision
Sensors
- URL: http://arxiv.org/abs/2211.09648v1
- Date: Thu, 17 Nov 2022 16:48:50 GMT
- Title: HARDVS: Revisiting Human Activity Recognition with Dynamic Vision
Sensors
- Authors: Xiao Wang, Zongzhen Wu, Bo Jiang, Zhimin Bao, Lin Zhu, Guoqi Li,
Yaowei Wang, Yonghong Tian
- Abstract summary: The main streams of human activity recognition (HAR) algorithms are developed based on RGB cameras which are suffered from illumination, fast motion, privacy-preserving, and large energy consumption.
Meanwhile, the biologically inspired event cameras attracted great interest due to their unique features, such as high dynamic range, dense temporal but sparse spatial resolution, low latency, low power, etc.
As it is a newly arising sensor, even there is no realistic large-scale dataset for HAR.
We propose a large-scale benchmark dataset, termed HARDVS, which contains 300 categories and more than 100K event sequences.
- Score: 40.949347728083474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The main streams of human activity recognition (HAR) algorithms are developed
based on RGB cameras which are suffered from illumination, fast motion,
privacy-preserving, and large energy consumption. Meanwhile, the biologically
inspired event cameras attracted great interest due to their unique features,
such as high dynamic range, dense temporal but sparse spatial resolution, low
latency, low power, etc. As it is a newly arising sensor, even there is no
realistic large-scale dataset for HAR. Considering its great practical value,
in this paper, we propose a large-scale benchmark dataset to bridge this gap,
termed HARDVS, which contains 300 categories and more than 100K event
sequences. We evaluate and report the performance of multiple popular HAR
algorithms, which provide extensive baselines for future works to compare. More
importantly, we propose a novel spatial-temporal feature learning and fusion
framework, termed ESTF, for event stream based human activity recognition. It
first projects the event streams into spatial and temporal embeddings using
StemNet, then, encodes and fuses the dual-view representations using
Transformer networks. Finally, the dual features are concatenated and fed into
a classification head for activity prediction. Extensive experiments on
multiple datasets fully validated the effectiveness of our model. Both the
dataset and source code will be released on
\url{https://github.com/Event-AHU/HARDVS}.
Related papers
- SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition [13.426390494116776]
Human action recognition (HAR) plays a key role in various applications such as video analysis, surveillance, autonomous driving, robotics, and healthcare.
Most HAR algorithms are developed from RGB images, which capture detailed visual information.
Event cameras offer a promising solution by capturing scene brightness changes sparsely at the pixel level, without capturing full images.
arXiv Detail & Related papers (2024-10-22T07:00:43Z) - Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms [29.577583619354314]
We propose a large-scale, high-definition ($1280 times 800$) human action recognition dataset based on the CeleX-V event camera.
To build a more comprehensive benchmark dataset, we report over 20 mainstream HAR models for future works to compare.
arXiv Detail & Related papers (2024-08-19T07:52:20Z) - DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition [51.96660522869841]
DailyDVS-200 is a benchmark dataset tailored for the event-based action recognition community.
It covers 200 action categories across real-world scenarios, recorded by 47 participants, and comprises more than 22,000 event sequences.
DailyDVS-200 is annotated with 14 attributes, ensuring a detailed characterization of the recorded actions.
arXiv Detail & Related papers (2024-07-06T15:25:10Z) - SPADES: A Realistic Spacecraft Pose Estimation Dataset using Event
Sensing [9.583223655096077]
Due to limited access to real target datasets, algorithms are often trained using synthetic data and applied in the real domain.
Event sensing has been explored in the past and shown to reduce the domain gap between simulations and real-world scenarios.
We introduce a novel dataset, SPADES, comprising real event data acquired in a controlled laboratory environment and simulated event data using the same camera intrinsics.
arXiv Detail & Related papers (2023-11-09T12:14:47Z) - Event-based Simultaneous Localization and Mapping: A Comprehensive Survey [52.73728442921428]
Review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks.
Paper categorizes event-based vSLAM methods into four main categories: feature-based, direct, motion-compensation, and deep learning methods.
arXiv Detail & Related papers (2023-04-19T16:21:14Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously
Exploiting Image and Event Modalities [6.543272301133159]
Event cameras detect changes in per-pixel intensity to generate asynchronous event streams.
They offer great potential for accurate semantic map retrieval in real-time autonomous systems.
Existing implementations for event segmentation suffer from sub-based performance.
We propose hybrid end-to-end learning framework HALSIE to reduce inference cost by up to $20times$ versus art.
arXiv Detail & Related papers (2022-11-19T17:09:50Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.