Event Masked Autoencoder: Point-wise Action Recognition with Event-Based Cameras
- URL: http://arxiv.org/abs/2501.01040v1
- Date: Thu, 02 Jan 2025 03:49:03 GMT
- Title: Event Masked Autoencoder: Point-wise Action Recognition with Event-Based Cameras
- Authors: Jingkai Sun, Qiang Zhang, Jiaxu Wang, Jiahang Cao, Renjing Xu,
- Abstract summary: We propose a novel framework that preserves and exploits the structure of event data for action recognition.
Our framework consists of two main components: 1) a point-wise event masked autoencoder (MAE) that learns a compact and discrimi representation of event patches by reconstructing them from masked raw event camera points data; 2) an improved event points patch generation algorithm that leverages an event data inlier model and point-wise data augmentation techniques to enhance the quality and diversity event points patches.
- Score: 8.089601548579116
- License:
- Abstract: Dynamic vision sensors (DVS) are bio-inspired devices that capture visual information in the form of asynchronous events, which encode changes in pixel intensity with high temporal resolution and low latency. These events provide rich motion cues that can be exploited for various computer vision tasks, such as action recognition. However, most existing DVS-based action recognition methods lose temporal information during data transformation or suffer from noise and outliers caused by sensor imperfections or environmental factors. To address these challenges, we propose a novel framework that preserves and exploits the spatiotemporal structure of event data for action recognition. Our framework consists of two main components: 1) a point-wise event masked autoencoder (MAE) that learns a compact and discriminative representation of event patches by reconstructing them from masked raw event camera points data; 2) an improved event points patch generation algorithm that leverages an event data inlier model and point-wise data augmentation techniques to enhance the quality and diversity of event points patches. To the best of our knowledge, our approach introduces the pre-train method into event camera raw points data for the first time, and we propose a novel event points patch embedding to utilize transformer-based models on event cameras.
Related papers
- EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting [76.02450110026747]
Event cameras, inspired by biological vision, record pixel-wise intensity changes asynchronously with high temporal resolution.
We propose Event-Aided Free-Trajectory 3DGS, which seamlessly integrates the advantages of event cameras into 3DGS.
We evaluate our method on the public Tanks and Temples benchmark and a newly collected real-world dataset, RealEv-DAVIS.
arXiv Detail & Related papers (2024-10-20T13:44:24Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - Cross-modal Place Recognition in Image Databases using Event-based
Sensors [28.124708490967713]
We present the first cross-modal visual place recognition framework that is capable of retrieving regular images from a database given an event query.
Our method demonstrates promising results with respect to the state-of-the-art frame-based and event-based methods on the Brisbane-Event-VPR dataset.
arXiv Detail & Related papers (2023-07-03T14:24:04Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks [55.81577205593956]
Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously.
Deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential.
arXiv Detail & Related papers (2023-02-17T14:19:28Z) - How Many Events do You Need? Event-based Visual Place Recognition Using
Sparse But Varying Pixels [29.6328152991222]
One of the potential applications of event camera research lies in visual place recognition for robot localization.
We show that the absolute difference in the number of events at those pixel locations accumulated into event frames can be sufficient for the place recognition task.
We evaluate our proposed approach on the Brisbane-Event-VPR dataset in an outdoor driving scenario, as well as the newly contributed indoor QCR-Event-VPR dataset.
arXiv Detail & Related papers (2022-06-28T00:24:12Z) - MEFNet: Multi-scale Event Fusion Network for Motion Deblurring [62.60878284671317]
Traditional frame-based cameras inevitably suffer from motion blur due to long exposure times.
As a kind of bio-inspired camera, the event camera records the intensity changes in an asynchronous way with high temporal resolution.
In this paper, we rethink the event-based image deblurring problem and unfold it into an end-to-end two-stage image restoration network.
arXiv Detail & Related papers (2021-11-30T23:18:35Z) - Bridging the Gap between Events and Frames through Unsupervised Domain
Adaptation [57.22705137545853]
We propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data.
We leverage the generative event model to split event features into content and motion features.
Our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks.
arXiv Detail & Related papers (2021-09-06T17:31:37Z) - Representation Learning for Event-based Visuomotor Policies [18.4767874925189]
We present an evental autocoder for unsupervised representation from asynchronous event data.
We show that it is feasible to learn compact representations fromtemporal event data to encode context.
We validate this framework of learning visuomotor policies by applying it to an obstacle avoidance scenario in simulation.
arXiv Detail & Related papers (2021-03-01T07:04:00Z) - Learning to Detect Objects with a 1 Megapixel Event Camera [14.949946376335305]
Event cameras encode visual information with high temporal precision, low data-rate, and high-dynamic range.
Due to the novelty of the field, the performance of event-based systems on many vision tasks is still lower compared to conventional frame-based solutions.
arXiv Detail & Related papers (2020-09-28T16:03:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.