E$^2$(GO)MOTION: Motion Augmented Event Stream for Egocentric Action
Recognition
- URL: http://arxiv.org/abs/2112.03596v1
- Date: Tue, 7 Dec 2021 09:43:08 GMT
- Title: E$^2$(GO)MOTION: Motion Augmented Event Stream for Egocentric Action
Recognition
- Authors: Chiara Plizzari, Mirco Planamente, Gabriele Goletto, Marco Cannici,
Emanuele Gusso, Matteo Matteucci, Barbara Caputo
- Abstract summary: Event cameras capture pixel-level intensity changes in the form of "events"
N-EPIC-Kitchens is the first event-based camera extension of the large-scale EPIC-Kitchens dataset.
We show that event data provides a comparable performance to RGB and optical flow, yet without any additional flow computation at deploy time.
- Score: 21.199869051111367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras are novel bio-inspired sensors, which asynchronously capture
pixel-level intensity changes in the form of "events". Due to their sensing
mechanism, event cameras have little to no motion blur, a very high temporal
resolution and require significantly less power and memory than traditional
frame-based cameras. These characteristics make them a perfect fit to several
real-world applications such as egocentric action recognition on wearable
devices, where fast camera motion and limited power challenge traditional
vision sensors. However, the ever-growing field of event-based vision has, to
date, overlooked the potential of event cameras in such applications. In this
paper, we show that event data is a very valuable modality for egocentric
action recognition. To do so, we introduce N-EPIC-Kitchens, the first
event-based camera extension of the large-scale EPIC-Kitchens dataset. In this
context, we propose two strategies: (i) directly processing event-camera data
with traditional video-processing architectures (E$^2$(GO)) and (ii) using
event-data to distill optical flow information (E$^2$(GO)MO). On our proposed
benchmark, we show that event data provides a comparable performance to RGB and
optical flow, yet without any additional flow computation at deploy time, and
an improved performance of up to 4% with respect to RGB only information.
Related papers
- Generalized Event Cameras [15.730999915036705]
Event cameras capture the world at high time resolution and with minimal bandwidth requirements.
We design generalized event cameras that inherently preserve scene intensity in a bandwidth-efficient manner.
Our single-photon event cameras are capable of high-speed, high-fidelity imaging at low readout rates.
arXiv Detail & Related papers (2024-07-02T21:48:32Z) - Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction [51.87279764576998]
We propose EvRGBHand -- the first approach for 3D hand mesh reconstruction with an event camera and an RGB camera compensating for each other.
EvRGBHand can tackle overexposure and motion blur issues in RGB-based HMR and foreground scarcity and background overflow issues in event-based HMR.
arXiv Detail & Related papers (2024-03-12T06:04:50Z) - Event-Based Motion Magnification [28.057537257958963]
We propose a dual-camera system consisting of an event camera and a conventional RGB camera for video motion magnification.
This innovative combination enables a broad and cost-effective amplification of high-frequency motions.
We demonstrate the effectiveness and accuracy of our dual-camera system and network, offering a cost-effective and flexible solution for motion detection and magnification.
arXiv Detail & Related papers (2024-02-19T08:59:58Z) - Deep Event Visual Odometry [40.57142632274148]
Event cameras offer the exciting possibility of tracking the camera's pose during high-speed motion.
Existing event-based monocular visual odometry approaches demonstrate limited performance on recent benchmarks.
We present Deep Event VO (DEVO), the first monocular event-only system with strong performance on a large number of real-world benchmarks.
arXiv Detail & Related papers (2023-12-15T14:00:00Z) - EventAid: Benchmarking Event-aided Image/Video Enhancement Algorithms
with Real-captured Hybrid Dataset [55.12137324648253]
Event cameras are emerging imaging technology that offers advantages over conventional frame-based imaging sensors in dynamic range and sensing speed.
This paper focuses on five event-aided image and video enhancement tasks.
arXiv Detail & Related papers (2023-12-13T15:42:04Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - MEFNet: Multi-scale Event Fusion Network for Motion Deblurring [62.60878284671317]
Traditional frame-based cameras inevitably suffer from motion blur due to long exposure times.
As a kind of bio-inspired camera, the event camera records the intensity changes in an asynchronous way with high temporal resolution.
In this paper, we rethink the event-based image deblurring problem and unfold it into an end-to-end two-stage image restoration network.
arXiv Detail & Related papers (2021-11-30T23:18:35Z) - Moving Object Detection for Event-based vision using Graph Spectral
Clustering [6.354824287948164]
Moving object detection has been a central topic of discussion in computer vision for its wide range of applications.
We present an unsupervised Graph Spectral Clustering technique for Moving Object Detection in Event-based data.
We additionally show how the optimum number of moving objects can be automatically determined.
arXiv Detail & Related papers (2021-09-30T10:19:22Z) - Combining Events and Frames using Recurrent Asynchronous Multimodal
Networks for Monocular Depth Prediction [51.072733683919246]
We introduce Recurrent Asynchronous Multimodal (RAM) networks to handle asynchronous and irregular data from multiple sensors.
Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction.
We show an improvement over state-of-the-art methods by up to 30% in terms of mean depth absolute error.
arXiv Detail & Related papers (2021-02-18T13:24:35Z) - EventHands: Real-Time Neural 3D Hand Reconstruction from an Event Stream [80.15360180192175]
3D hand pose estimation from monocular videos is a long-standing and challenging problem.
We address it for the first time using a single event camera, i.e., an asynchronous vision sensor reacting on brightness changes.
Our approach has characteristics previously not demonstrated with a single RGB or depth camera.
arXiv Detail & Related papers (2020-12-11T16:45:34Z) - Learning to Detect Objects with a 1 Megapixel Event Camera [14.949946376335305]
Event cameras encode visual information with high temporal precision, low data-rate, and high-dynamic range.
Due to the novelty of the field, the performance of event-based systems on many vision tasks is still lower compared to conventional frame-based solutions.
arXiv Detail & Related papers (2020-09-28T16:03:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.