Masked Event Modeling: Self-Supervised Pretraining for Event Cameras
- URL: http://arxiv.org/abs/2212.10368v3
- Date: Sat, 23 Dec 2023 21:01:38 GMT
- Title: Masked Event Modeling: Self-Supervised Pretraining for Event Cameras
- Authors: Simon Klenk, David Bonello, Lukas Koestler, Nikita Araslanov, Daniel
Cremers
- Abstract summary: Masked Event Modeling (MEM) is a self-supervised framework for events.
MEM pretrains a neural network on unlabeled events, which can originate from any event camera recording.
Our method reaches state-of-the-art classification accuracy across three datasets.
- Score: 41.263606382601886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras asynchronously capture brightness changes with low latency,
high temporal resolution, and high dynamic range. However, annotation of event
data is a costly and laborious process, which limits the use of deep learning
methods for classification and other semantic tasks with the event modality. To
reduce the dependency on labeled event data, we introduce Masked Event Modeling
(MEM), a self-supervised framework for events. Our method pretrains a neural
network on unlabeled events, which can originate from any event camera
recording. Subsequently, the pretrained model is finetuned on a downstream
task, leading to a consistent improvement of the task accuracy. For example,
our method reaches state-of-the-art classification accuracy across three
datasets, N-ImageNet, N-Cars, and N-Caltech101, increasing the top-1 accuracy
of previous work by significant margins. When tested on real-world event data,
MEM is even superior to supervised RGB-based pretraining. The models pretrained
with MEM are also label-efficient and generalize well to the dense task of
semantic image segmentation.
Related papers
- Evaluating Image-Based Face and Eye Tracking with Event Cameras [9.677797822200965]
Event Cameras, also known as Neuromorphic sensors, capture changes in local light intensity at the pixel level, producing asynchronously generated data termed events''
This data format mitigates common issues observed in conventional cameras, like under-sampling when capturing fast-moving objects.
We evaluate the viability of integrating conventional algorithms with event-based data, transformed into a frame format.
arXiv Detail & Related papers (2024-08-19T20:27:08Z) - Scalable Event-by-event Processing of Neuromorphic Sensory Signals With Deep State-Space Models [2.551844666707809]
Event-based sensors are well suited for real-time processing.
Current methods either collapse events into frames or cannot scale up when processing the event data directly event-by-event.
arXiv Detail & Related papers (2024-04-29T08:50:27Z) - Improving Event Definition Following For Zero-Shot Event Detection [66.27883872707523]
Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types.
We aim to improve zero-shot event detection by training models to better follow event definitions.
arXiv Detail & Related papers (2024-03-05T01:46:50Z) - Event Camera Data Dense Pre-training [10.918407820258246]
This paper introduces a self-supervised learning framework designed for pre-training neural networks tailored to dense prediction tasks using event camera data.
For training our framework, we curate a synthetic event camera dataset featuring diverse scene and motion patterns.
arXiv Detail & Related papers (2023-11-20T04:36:19Z) - Event Camera Data Pre-training [14.77724035068357]
Our model is a self-supervised learning framework, and uses paired event camera data and natural RGB images for training.
We achieve top-1 accuracy at 64.83% on the N-ImageNet dataset.
arXiv Detail & Related papers (2023-01-05T06:32:50Z) - Robust Event Classification Using Imperfect Real-world PMU Data [58.26737360525643]
We study robust event classification using imperfect real-world phasor measurement unit (PMU) data.
We develop a novel machine learning framework for training robust event classifiers.
arXiv Detail & Related papers (2021-10-19T17:41:43Z) - Bridging the Gap between Events and Frames through Unsupervised Domain
Adaptation [57.22705137545853]
We propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data.
We leverage the generative event model to split event features into content and motion features.
Our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks.
arXiv Detail & Related papers (2021-09-06T17:31:37Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.