EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning
- URL: http://arxiv.org/abs/2409.11813v1
- Date: Wed, 18 Sep 2024 09:01:34 GMT
- Title: EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning
- Authors: Yukun Tian, Hao Chen, Yongjian Deng, Feihong Shen, Kepan Liu, Wei You, Ziyang Zhang,
- Abstract summary: Event cameras have demonstrated significant success across a wide range of areas due to its low time latency and high dynamic range.
However, the community faces challenges such as data deficiency and limited diversity, often resulting in over-fitting and inadequate feature learning.
This work aims to introduce a systematic augmentation scheme named EventAug to enrich spatial-temporal diversity.
- Score: 15.727918674166714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The event camera has demonstrated significant success across a wide range of areas due to its low time latency and high dynamic range. However, the community faces challenges such as data deficiency and limited diversity, often resulting in over-fitting and inadequate feature learning. Notably, the exploration of data augmentation techniques in the event community remains scarce. This work aims to address this gap by introducing a systematic augmentation scheme named EventAug to enrich spatial-temporal diversity. In particular, we first propose Multi-scale Temporal Integration (MSTI) to diversify the motion speed of objects, then introduce Spatial-salient Event Mask (SSEM) and Temporal-salient Event Mask (TSEM) to enrich object variants. Our EventAug can facilitate models learning with richer motion patterns, object variants and local spatio-temporal relations, thus improving model robustness to varied moving speeds, occlusions, and action disruptions. Experiment results show that our augmentation method consistently yields significant improvements across different tasks and backbones (e.g., a 4.87% accuracy gain on DVS128 Gesture). Our code will be publicly available for this community.
Related papers
- Path-adaptive Spatio-Temporal State Space Model for Event-based Recognition with Arbitrary Duration [9.547947845734992]
Event cameras are bio-inspired sensors that capture the intensity changes asynchronously and output event streams.
We present a novel framework, dubbed PAST-Act, exhibiting superior capacity in recognizing events with arbitrary duration.
We also build a minute-level event-based recognition dataset, named ArDVS100, with arbitrary duration for the benefit of the community.
arXiv Detail & Related papers (2024-09-25T14:08:37Z) - EventZoom: A Progressive Approach to Event-Based Data Augmentation for Enhanced Neuromorphic Vision [9.447299017563841]
Dynamic Vision Sensors (DVS) capture event data with high temporal resolution and low power consumption.
Event data augmentation serve as an essential method for overcoming the limitation of scale and diversity in event datasets.
arXiv Detail & Related papers (2024-05-29T08:39:31Z) - MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking [50.26836546224782]
Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy.
The diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization.
This paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information.
arXiv Detail & Related papers (2024-04-18T11:09:25Z) - Implicit Event-RGBD Neural SLAM [54.74363487009845]
Implicit neural SLAM has achieved remarkable progress recently.
Existing methods face significant challenges in non-ideal scenarios.
We propose EN-SLAM, the first event-RGBD implicit neural SLAM framework.
arXiv Detail & Related papers (2023-11-18T08:48:58Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - Improved Regularization of Event-based Learning by Reversing and
Drifting [4.736525128377909]
Event camera has enormous potential in challenging scenes for its advantages of high temporal resolution, high dynamic range, low power consumption, and no motion blur.
We propose two novel augmentation methods: EventReverse and EventDrift.
arXiv Detail & Related papers (2022-07-24T04:23:56Z) - Progressive Attention on Multi-Level Dense Difference Maps for Generic
Event Boundary Detection [35.16241630620967]
Generic event boundary detection is an important yet challenging task in video understanding.
This paper presents an effective and end-to-end learnable framework (DDM-Net) to tackle the diversity and complicated semantics of event boundaries.
arXiv Detail & Related papers (2021-12-09T09:00:05Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Learn to cycle: Time-consistent feature discovery for action recognition [83.43682368129072]
Generalizing over temporal variations is a prerequisite for effective action recognition in videos.
We introduce Squeeze Re Temporal Gates (SRTG), an approach that favors temporal activations with potential variations.
We show consistent improvement when using SRTPG blocks, with only a minimal increase in the number of GFLOs.
arXiv Detail & Related papers (2020-06-15T09:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.