EventCLIP: Adapting CLIP for Event-based Object Recognition
- URL: http://arxiv.org/abs/2306.06354v3
- Date: Thu, 16 Nov 2023 19:26:02 GMT
- Title: EventCLIP: Adapting CLIP for Event-based Object Recognition
- Authors: Ziyi Wu, Xudong Liu, Igor Gilitschenski
- Abstract summary: EventCLIP is a novel approach that utilizes CLIP for zero-shot and few-shot event-based object recognition.
We first generalize CLIP's image encoder to event data by converting raw events to 2D grid-based representations.
We evaluate EventCLIP on N-Caltech, N-Cars, and N-ImageNet datasets, achieving state-of-the-art few-shot performance.
- Score: 26.35633454924899
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in zero-shot and few-shot classification heavily rely on the
success of pre-trained vision-language models (VLMs) such as CLIP. Due to a
shortage of large-scale datasets, training such models for event camera data
remains infeasible. Thus, adapting existing VLMs across modalities to event
vision is an important research challenge. In this work, we introduce
EventCLIP, a novel approach that utilizes CLIP for zero-shot and few-shot
event-based object recognition. We first generalize CLIP's image encoder to
event data by converting raw events to 2D grid-based representations. To
further enhance performance, we propose a feature adapter to aggregate temporal
information over event frames and refine text embeddings to better align with
the visual inputs. We evaluate EventCLIP on N-Caltech, N-Cars, and N-ImageNet
datasets, achieving state-of-the-art few-shot performance. When fine-tuned on
the entire dataset, our method outperforms all existing event classifiers.
Moreover, we explore practical applications of EventCLIP including robust event
classification and label-free event recognition, where our approach surpasses
previous baselines designed specifically for these tasks.
Related papers
- CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding [52.67839570524888]
We present CEIA, an effective framework for open-world event-based understanding.
We leverage the rich event-image datasets to learn an event embedding space aligned with the image space of CLIP.
CEIA offers two distinct advantages. First, it allows us to take full advantage of the existing event-image datasets to make up the shortage of large-scale event-text datasets.
arXiv Detail & Related papers (2024-07-09T07:26:15Z) - GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling [89.07386210297373]
GenEARL is a training-free generative framework that harnesses the power of modern generative models to understand event task descriptions.
We show that GenEARL outperforms the contrastive pretraining (CLIP) baseline by 9.4% and 14.2% accuracy for zero-shot EARL on the M2E2 and SwiG datasets.
arXiv Detail & Related papers (2024-04-07T00:28:13Z) - Improving Event Definition Following For Zero-Shot Event Detection [66.27883872707523]
Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types.
We aim to improve zero-shot event detection by training models to better follow event definitions.
arXiv Detail & Related papers (2024-03-05T01:46:50Z) - EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding [7.797154022794006]
EventBind is a novel framework that unleashes the potential of vision-language models (VLMs) for event-based recognition.
We first introduce a novel event encoder that subtly models the temporal information from events.
We then design a text encoder that generates content prompts and utilizes hybrid text prompts to enhance EventBind's generalization ability.
arXiv Detail & Related papers (2023-08-06T15:05:42Z) - PILED: An Identify-and-Localize Framework for Few-Shot Event Detection [79.66042333016478]
In our study, we employ cloze prompts to elicit event-related knowledge from pretrained language models.
We minimize the number of type-specific parameters, enabling our model to quickly adapt to event detection tasks for new types.
arXiv Detail & Related papers (2022-02-15T18:01:39Z) - CLIP-Event: Connecting Text and Images with Event Structures [123.31452120399827]
We propose a contrastive learning framework to enforce vision-language pretraining models.
We take advantage of text information extraction technologies to obtain event structural knowledge.
Experiments show that our zero-shot CLIP-Event outperforms the state-of-the-art supervised model in argument extraction.
arXiv Detail & Related papers (2022-01-13T17:03:57Z) - N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event
Cameras [5.726662931271546]
We introduce N-ImageNet, a large-scale dataset targeted for robust, fine-grained object recognition with event cameras.
N-ImageNet serves as a challenging benchmark for event-based object recognition, due to its large number of classes and samples.
arXiv Detail & Related papers (2021-12-02T08:08:32Z) - Robust Event Classification Using Imperfect Real-world PMU Data [58.26737360525643]
We study robust event classification using imperfect real-world phasor measurement unit (PMU) data.
We develop a novel machine learning framework for training robust event classifiers.
arXiv Detail & Related papers (2021-10-19T17:41:43Z) - Event-LSTM: An Unsupervised and Asynchronous Learning-based
Representation for Event-based Data [8.931153235278831]
Event cameras are activity-driven bio-inspired vision sensors.
We propose Event-LSTM, an unsupervised Auto-Encoder architecture made up of LSTM layers.
We also push state-of-the-art event de-noising forward by introducing memory into the de-noising process.
arXiv Detail & Related papers (2021-05-10T09:18:52Z) - A Differentiable Recurrent Surface for Asynchronous Event-Based Data [19.605628378366667]
We propose Matrix-LSTM, a grid of Long Short-Term Memory (LSTM) cells that efficiently process events and learn end-to-end task-dependent event-surfaces.
Compared to existing reconstruction approaches, our learned event-surface shows good flexibility and on optical flow estimation.
It improves the state-of-the-art of event-based object classification on the N-Cars dataset.
arXiv Detail & Related papers (2020-01-10T14:09:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.