Learning to Detect Objects with a 1 Megapixel Event Camera
- URL: http://arxiv.org/abs/2009.13436v2
- Date: Wed, 9 Dec 2020 15:41:24 GMT
- Title: Learning to Detect Objects with a 1 Megapixel Event Camera
- Authors: Etienne Perot, Pierre de Tournemire, Davide Nitti, Jonathan Masci,
Amos Sironi
- Abstract summary: Event cameras encode visual information with high temporal precision, low data-rate, and high-dynamic range.
Due to the novelty of the field, the performance of event-based systems on many vision tasks is still lower compared to conventional frame-based solutions.
- Score: 14.949946376335305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras encode visual information with high temporal precision, low
data-rate, and high-dynamic range. Thanks to these characteristics, event
cameras are particularly suited for scenarios with high motion, challenging
lighting conditions and requiring low latency. However, due to the novelty of
the field, the performance of event-based systems on many vision tasks is still
lower compared to conventional frame-based solutions. The main reasons for this
performance gap are: the lower spatial resolution of event sensors, compared to
frame cameras; the lack of large-scale training datasets; the absence of well
established deep learning architectures for event-based processing. In this
paper, we address all these problems in the context of an event-based object
detection task. First, we publicly release the first high-resolution
large-scale dataset for object detection. The dataset contains more than 14
hours recordings of a 1 megapixel event camera, in automotive scenarios,
together with 25M bounding boxes of cars, pedestrians, and two-wheelers,
labeled at high frequency. Second, we introduce a novel recurrent architecture
for event-based detection and a temporal consistency loss for better-behaved
training. The ability to compactly represent the sequence of events into the
internal memory of the model is essential to achieve high accuracy. Our model
outperforms by a large margin feed-forward event-based architectures. Moreover,
our method does not require any reconstruction of intensity images from events,
showing that training directly from raw events is possible, more efficient, and
more accurate than passing through an intermediate intensity image. Experiments
on the dataset introduced in this work, for which events and gray level images
are available, show performance on par with that of highly tuned and studied
frame-based detectors.
Related papers
- Event-to-Video Conversion for Overhead Object Detection [7.744259147081667]
Event cameras complicate downstream image processing, especially for complex tasks such as object detection.
We show that there is a significant gap in performance between dense event representations and corresponding RGB frames.
We apply event-to-video conversion models that convert event streams into gray-scale video to close this gap.
arXiv Detail & Related papers (2024-02-09T22:07:39Z) - EventAid: Benchmarking Event-aided Image/Video Enhancement Algorithms
with Real-captured Hybrid Dataset [55.12137324648253]
Event cameras are emerging imaging technology that offers advantages over conventional frame-based imaging sensors in dynamic range and sensing speed.
This paper focuses on five event-aided image and video enhancement tasks.
arXiv Detail & Related papers (2023-12-13T15:42:04Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - A Temporal Densely Connected Recurrent Network for Event-based Human
Pose Estimation [24.367222637492787]
Event camera is an emerging bio-inspired vision sensors that report per-pixel brightness changes asynchronously.
This paper proposes a novel densely connected recurrent architecture to address the problem of incomplete information.
By this recurrent architecture, we can explicitly model not only the sequential but also non-sequential geometric consistency across time steps.
arXiv Detail & Related papers (2022-09-15T04:08:18Z) - Are High-Resolution Event Cameras Really Needed? [62.70541164894224]
In low-illumination conditions and at high speeds, low-resolution cameras can outperform high-resolution ones, while requiring a significantly lower bandwidth.
We provide both empirical and theoretical evidence for this claim, which indicates that high-resolution event cameras exhibit higher per-pixel event rates.
In most cases, high-resolution event cameras show a lower task performance, compared to lower resolution sensors in these conditions.
arXiv Detail & Related papers (2022-03-28T12:06:20Z) - Moving Object Detection for Event-based vision using Graph Spectral
Clustering [6.354824287948164]
Moving object detection has been a central topic of discussion in computer vision for its wide range of applications.
We present an unsupervised Graph Spectral Clustering technique for Moving Object Detection in Event-based data.
We additionally show how the optimum number of moving objects can be automatically determined.
arXiv Detail & Related papers (2021-09-30T10:19:22Z) - Bridging the Gap between Events and Frames through Unsupervised Domain
Adaptation [57.22705137545853]
We propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data.
We leverage the generative event model to split event features into content and motion features.
Our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks.
arXiv Detail & Related papers (2021-09-06T17:31:37Z) - Time-Ordered Recent Event (TORE) Volumes for Event Cameras [21.419206807872797]
Event cameras are an exciting, new sensor modality enabling high-speed imaging with extremely low-latency and wide dynamic range.
Most machine learning architectures are not designed to directly handle sparse data, like that generated from event cameras.
This paper details an event representation called Time-Ordered Recent Event (TORE) volumes. TORE volumes are designed to compactly store raw spike timing information with minimal information loss.
arXiv Detail & Related papers (2021-03-10T15:03:38Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.