Masked Event Modeling: Self-Supervised Pretraining for Event Cameras
- URL: http://arxiv.org/abs/2212.10368v3
- Date: Sat, 23 Dec 2023 21:01:38 GMT
- Title: Masked Event Modeling: Self-Supervised Pretraining for Event Cameras
- Authors: Simon Klenk, David Bonello, Lukas Koestler, Nikita Araslanov, Daniel
Cremers
- Abstract summary: Masked Event Modeling (MEM) is a self-supervised framework for events.
MEM pretrains a neural network on unlabeled events, which can originate from any event camera recording.
Our method reaches state-of-the-art classification accuracy across three datasets.
- Score: 41.263606382601886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras asynchronously capture brightness changes with low latency,
high temporal resolution, and high dynamic range. However, annotation of event
data is a costly and laborious process, which limits the use of deep learning
methods for classification and other semantic tasks with the event modality. To
reduce the dependency on labeled event data, we introduce Masked Event Modeling
(MEM), a self-supervised framework for events. Our method pretrains a neural
network on unlabeled events, which can originate from any event camera
recording. Subsequently, the pretrained model is finetuned on a downstream
task, leading to a consistent improvement of the task accuracy. For example,
our method reaches state-of-the-art classification accuracy across three
datasets, N-ImageNet, N-Cars, and N-Caltech101, increasing the top-1 accuracy
of previous work by significant margins. When tested on real-world event data,
MEM is even superior to supervised RGB-based pretraining. The models pretrained
with MEM are also label-efficient and generalize well to the dense task of
semantic image segmentation.
Related papers
- Improving Event Definition Following For Zero-Shot Event Detection [66.27883872707523]
Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types.
We aim to improve zero-shot event detection by training models to better follow event definitions.
arXiv Detail & Related papers (2024-03-05T01:46:50Z) - Event Camera Data Dense Pre-training [12.27119620314554]
This paper introduces a self-supervised learning framework designed for pre-training neural networks tailored to dense prediction tasks using event camera data.
For training our framework, we curate a synthetic event camera dataset featuring diverse scene and motion patterns.
arXiv Detail & Related papers (2023-11-20T04:36:19Z) - Event Camera Data Pre-training [14.77724035068357]
Our model is a self-supervised learning framework, and uses paired event camera data and natural RGB images for training.
We achieve top-1 accuracy at 64.83% on the N-ImageNet dataset.
arXiv Detail & Related papers (2023-01-05T06:32:50Z) - Robust Event Classification Using Imperfect Real-world PMU Data [58.26737360525643]
We study robust event classification using imperfect real-world phasor measurement unit (PMU) data.
We develop a novel machine learning framework for training robust event classifiers.
arXiv Detail & Related papers (2021-10-19T17:41:43Z) - Bridging the Gap between Events and Frames through Unsupervised Domain
Adaptation [57.22705137545853]
We propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data.
We leverage the generative event model to split event features into content and motion features.
Our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks.
arXiv Detail & Related papers (2021-09-06T17:31:37Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z) - Learning to Detect Objects with a 1 Megapixel Event Camera [14.949946376335305]
Event cameras encode visual information with high temporal precision, low data-rate, and high-dynamic range.
Due to the novelty of the field, the performance of event-based systems on many vision tasks is still lower compared to conventional frame-based solutions.
arXiv Detail & Related papers (2020-09-28T16:03:59Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.