EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera
- URL: http://arxiv.org/abs/2503.12419v2
- Date: Mon, 14 Apr 2025 02:44:20 GMT
- Title: EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera
- Authors: Luming Wang, Hao Shi, Xiaoting Yin, Kailun Yang, Kaiwei Wang, Jian Bai,
- Abstract summary: We propose a novel network architecture specifically designed for event data processing.<n>We establish the first large-scale dataset for egocentric gesture recognition using event cameras.<n>Our method achieves 62.7% accuracy tested on unseen subjects with only 7M parameters, 3.1% higher than state-of-the-art approaches.
- Score: 17.61884467264023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Egocentric gesture recognition is a pivotal technology for enhancing natural human-computer interaction, yet traditional RGB-based solutions suffer from motion blur and illumination variations in dynamic scenarios. While event cameras show distinct advantages in handling high dynamic range with ultra-low power consumption, existing RGB-based architectures face inherent limitations in processing asynchronous event streams due to their synchronous frame-based nature. Moreover, from an egocentric perspective, event cameras record data that includes events generated by both head movements and hand gestures, thereby increasing the complexity of gesture recognition. To address this, we propose a novel network architecture specifically designed for event data processing, incorporating (1) a lightweight CNN with asymmetric depthwise convolutions to reduce parameters while preserving spatiotemporal features, (2) a plug-and-play state-space model as context block that decouples head movement noise from gesture dynamics, and (3) a parameter-free Bins-Temporal Shift Module (BSTM) that shifts features along bins and temporal dimensions to fuse sparse events efficiently. We further establish the EgoEvGesture dataset, the first large-scale dataset for egocentric gesture recognition using event cameras. Experimental results demonstrate that our method achieves 62.7% accuracy tested on unseen subjects with only 7M parameters, 3.1% higher than state-of-the-art approaches. Notable misclassifications in freestyle motions stem from high inter-personal variability and unseen test patterns differing from training data. Moreover, our approach achieved a remarkable accuracy of 97.0% on the DVS128 Gesture, demonstrating the effectiveness and generalization capability of our method on public datasets. The dataset and models are made available at https://github.com/3190105222/EgoEv_Gesture.
Related papers
- Spatiotemporal Attention Learning Framework for Event-Driven Object Recognition [1.0445957451908694]
Event-based vision sensors capture local pixel-level intensity changes as a sparse event stream containing position, polarity, and information.
This paper presents a novel learning framework for event-based object recognition, utilizing a VARGG network enhanced with Contemporalal Block Attention Module (CBAM)
Our approach achieves comparable performance to state-of-the-art ResNet-based methods while reducing parameter count by 2.3% compared to the original VGG model.
arXiv Detail & Related papers (2025-04-01T02:37:54Z) - Dynamic Reconstruction from Neuromorphic Data [0.0]
Neuromorphic sensors only register changes' at pixels where a change is occurring asynchronously.
This enables neuromorphic sensors to sample at a micro-second level and efficiently capture the dynamics.
Existing approaches, including the ones recently introduced by the authors, use traditional images combined with neuromorphic event data to carry out reconstructions.
arXiv Detail & Related papers (2024-08-28T01:06:19Z) - A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation [3.355813093377501]
Event cameras encode temporal changes in light intensity as asynchronous binary spikes.<n>Their unconventional spiking output and the scarcity of labelled datasets pose significant challenges to traditional image-based depth estimation methods.<n>We propose a novel energy-efficient Spike-Driven Transformer Network (SDT) for depth estimation, leveraging the unique properties of spiking data.
arXiv Detail & Related papers (2024-04-26T11:32:53Z) - Implicit Event-RGBD Neural SLAM [54.74363487009845]
Implicit neural SLAM has achieved remarkable progress recently.
Existing methods face significant challenges in non-ideal scenarios.
We propose EN-SLAM, the first event-RGBD implicit neural SLAM framework.
arXiv Detail & Related papers (2023-11-18T08:48:58Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - Event Camera-based Visual Odometry for Dynamic Motion Tracking of a
Legged Robot Using Adaptive Time Surface [5.341864681049579]
Event cameras offer high temporal resolution and dynamic range, which can eliminate the issue of blurred RGB images during fast movements.
We introduce an adaptive time surface (ATS) method that addresses the whiteout and blackout issue in conventional time surfaces.
Lastly, we propose a nonlinear pose optimization formula that simultaneously performs 3D-2D alignment on both RGB-based and event-based maps and images.
arXiv Detail & Related papers (2023-05-15T19:03:45Z) - Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range.
We focus on event-based visual odometry (VO)
We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z) - Bridging the Gap between Events and Frames through Unsupervised Domain
Adaptation [57.22705137545853]
We propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data.
We leverage the generative event model to split event features into content and motion features.
Our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks.
arXiv Detail & Related papers (2021-09-06T17:31:37Z) - Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem
Formulation [53.850686395708905]
Event-based cameras record an asynchronous stream of per-pixel brightness changes.
In this paper, we focus on single-layer architectures for representation learning from event data.
We show improvements of up to 9 % in the recognition accuracy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-23T10:40:03Z) - Event-based Asynchronous Sparse Convolutional Networks [54.094244806123235]
Event cameras are bio-inspired sensors that respond to per-pixel brightness changes in the form of asynchronous and sparse "events"
We present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output.
We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks.
arXiv Detail & Related papers (2020-03-20T08:39:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.