Event Camera Data Dense Pre-training
- URL: http://arxiv.org/abs/2311.11533v1
- Date: Mon, 20 Nov 2023 04:36:19 GMT
- Title: Event Camera Data Dense Pre-training
- Authors: Yan Yang, Liyuan Pan, Liu Liu
- Abstract summary: This paper introduces a self-supervised learning framework designed for pre-training neural networks tailored to dense prediction tasks using event camera data.
For training our framework, we curate a synthetic event camera dataset featuring diverse scene and motion patterns.
- Score: 12.27119620314554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a self-supervised learning framework designed for
pre-training neural networks tailored to dense prediction tasks using event
camera data. Our approach utilizes solely event data for training.
Transferring achievements from dense RGB pre-training directly to event
camera data yields subpar performance. This is attributed to the spatial
sparsity inherent in an event image (converted from event data), where many
pixels do not contain information. To mitigate this sparsity issue, we encode
an event image into event patch features, automatically mine contextual
similarity relationships among patches, group the patch features into
distinctive contexts, and enforce context-to-context similarities to learn
discriminative event features.
For training our framework, we curate a synthetic event camera dataset
featuring diverse scene and motion patterns. Transfer learning performance on
downstream dense prediction tasks illustrates the superiority of our method
over state-of-the-art approaches. Notably, our single model secured the top
position in the challenging DSEC-Flow benchmark.
Related papers
- CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding [52.67839570524888]
We present CEIA, an effective framework for open-world event-based understanding.
We leverage the rich event-image datasets to learn an event embedding space aligned with the image space of CLIP.
CEIA offers two distinct advantages. First, it allows us to take full advantage of the existing event-image datasets to make up the shortage of large-scale event-text datasets.
arXiv Detail & Related papers (2024-07-09T07:26:15Z) - Text-to-Events: Synthetic Event Camera Streams from Conditional Text Input [8.365349007799296]
Event cameras are advantageous for tasks that require vision sensors with low-latency and sparse output responses.
This paper reports a method for creating new labelled event datasets by using a text-to-X model.
We demonstrate that the model can generate realistic event sequences of human gestures prompted by different text statements.
arXiv Detail & Related papers (2024-06-05T16:34:12Z) - Segment Any Events via Weighted Adaptation of Pivotal Tokens [85.39087004253163]
This paper focuses on the nuanced challenge of tailoring the Segment Anything Models (SAMs) for integration with event data.
We introduce a multi-scale feature distillation methodology to optimize the alignment of token embeddings originating from event data with their RGB image counterparts.
arXiv Detail & Related papers (2023-12-24T12:47:08Z) - Rethinking Event-based Human Pose Estimation with 3D Event
Representations [26.592295349210787]
Event cameras offer a robust solution for navigating challenging contexts.
We introduce two 3D event representations: the Rasterized Event Point Cloud and the Decoupled Event Voxel.
Experiments on EV-3DPW demonstrate that the robustness of our proposed 3D representation methods compared to traditional RGB images and event frame techniques.
arXiv Detail & Related papers (2023-11-08T10:45:09Z) - Graph-based Asynchronous Event Processing for Rapid Object Recognition [59.112755601918074]
Event cameras capture asynchronous events stream in which each event encodes pixel location, trigger time, and the polarity of the brightness changes.
We introduce a novel graph-based framework for event cameras, namely SlideGCN.
Our approach can efficiently process data event-by-event, unlock the low latency nature of events data while still maintaining the graph's structure internally.
arXiv Detail & Related papers (2023-08-28T08:59:57Z) - Event Camera Data Pre-training [14.77724035068357]
Our model is a self-supervised learning framework, and uses paired event camera data and natural RGB images for training.
We achieve top-1 accuracy at 64.83% on the N-ImageNet dataset.
arXiv Detail & Related papers (2023-01-05T06:32:50Z) - Masked Event Modeling: Self-Supervised Pretraining for Event Cameras [41.263606382601886]
Masked Event Modeling (MEM) is a self-supervised framework for events.
MEM pretrains a neural network on unlabeled events, which can originate from any event camera recording.
Our method reaches state-of-the-art classification accuracy across three datasets.
arXiv Detail & Related papers (2022-12-20T15:49:56Z) - CLIP-Event: Connecting Text and Images with Event Structures [123.31452120399827]
We propose a contrastive learning framework to enforce vision-language pretraining models.
We take advantage of text information extraction technologies to obtain event structural knowledge.
Experiments show that our zero-shot CLIP-Event outperforms the state-of-the-art supervised model in argument extraction.
arXiv Detail & Related papers (2022-01-13T17:03:57Z) - Bridging the Gap between Events and Frames through Unsupervised Domain
Adaptation [57.22705137545853]
We propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data.
We leverage the generative event model to split event features into content and motion features.
Our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks.
arXiv Detail & Related papers (2021-09-06T17:31:37Z) - Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem
Formulation [53.850686395708905]
Event-based cameras record an asynchronous stream of per-pixel brightness changes.
In this paper, we focus on single-layer architectures for representation learning from event data.
We show improvements of up to 9 % in the recognition accuracy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-23T10:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.