CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding
- URL: http://arxiv.org/abs/2407.06611v1
- Date: Tue, 9 Jul 2024 07:26:15 GMT
- Title: CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding
- Authors: Wenhao Xu, Wenming Weng, Yueyi Zhang, Zhiwei Xiong,
- Abstract summary: We present CEIA, an effective framework for open-world event-based understanding.
We leverage the rich event-image datasets to learn an event embedding space aligned with the image space of CLIP.
CEIA offers two distinct advantages. First, it allows us to take full advantage of the existing event-image datasets to make up the shortage of large-scale event-text datasets.
- Score: 52.67839570524888
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present CEIA, an effective framework for open-world event-based understanding. Currently training a large event-text model still poses a huge challenge due to the shortage of paired event-text data. In response to this challenge, CEIA learns to align event and image data as an alternative instead of directly aligning event and text data. Specifically, we leverage the rich event-image datasets to learn an event embedding space aligned with the image space of CLIP through contrastive learning. In this way, event and text data are naturally aligned via using image data as a bridge. Particularly, CEIA offers two distinct advantages. First, it allows us to take full advantage of the existing event-image datasets to make up the shortage of large-scale event-text datasets. Second, leveraging more training data, it also exhibits the flexibility to boost performance, ensuring scalable capability. In highlighting the versatility of our framework, we make extensive evaluations through a diverse range of event-based multi-modal applications, such as object recognition, event-image retrieval, event-text retrieval, and domain adaptation. The outcomes demonstrate CEIA's distinct zero-shot superiority over existing methods on these applications.
Related papers
- EA-VTR: Event-Aware Video-Text Retrieval [97.30850809266725]
Event-Aware Video-Text Retrieval model achieves powerful video-text retrieval ability through superior video event awareness.
EA-VTR can efficiently encode frame-level and video-level visual representations simultaneously, enabling detailed event content and complex event temporal cross-modal alignment.
arXiv Detail & Related papers (2024-07-10T09:09:58Z) - OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies [4.940059438666211]
Event-based semantic segmentation (ESS) is a fundamental yet challenging task for event camera sensing.
We synergize information from image, text, and event-data domains and introduce OpenESS to enable scalable ESS.
We achieve 53.93% and 43.31% mIoU on DDD17 and DSEC-Semantic benchmarks without using either event or frame labels.
arXiv Detail & Related papers (2024-05-08T17:59:58Z) - Event Camera Data Dense Pre-training [10.918407820258246]
This paper introduces a self-supervised learning framework designed for pre-training neural networks tailored to dense prediction tasks using event camera data.
For training our framework, we curate a synthetic event camera dataset featuring diverse scene and motion patterns.
arXiv Detail & Related papers (2023-11-20T04:36:19Z) - EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding [7.797154022794006]
EventBind is a novel framework that unleashes the potential of vision-language models (VLMs) for event-based recognition.
We first introduce a novel event encoder that subtly models the temporal information from events.
We then design a text encoder that generates content prompts and utilizes hybrid text prompts to enhance EventBind's generalization ability.
arXiv Detail & Related papers (2023-08-06T15:05:42Z) - EventCLIP: Adapting CLIP for Event-based Object Recognition [26.35633454924899]
EventCLIP is a novel approach that utilizes CLIP for zero-shot and few-shot event-based object recognition.
We first generalize CLIP's image encoder to event data by converting raw events to 2D grid-based representations.
We evaluate EventCLIP on N-Caltech, N-Cars, and N-ImageNet datasets, achieving state-of-the-art few-shot performance.
arXiv Detail & Related papers (2023-06-10T06:05:35Z) - CLIP-Event: Connecting Text and Images with Event Structures [123.31452120399827]
We propose a contrastive learning framework to enforce vision-language pretraining models.
We take advantage of text information extraction technologies to obtain event structural knowledge.
Experiments show that our zero-shot CLIP-Event outperforms the state-of-the-art supervised model in argument extraction.
arXiv Detail & Related papers (2022-01-13T17:03:57Z) - Integrating Deep Event-Level and Script-Level Information for Script
Event Prediction [60.67635412135681]
We propose a Transformer-based model, called MCPredictor, which integrates deep event-level and script-level information for script event prediction.
The experimental results on the widely-used New York Times corpus demonstrate the effectiveness and superiority of the proposed model.
arXiv Detail & Related papers (2021-09-24T07:37:32Z) - Learning Constraints and Descriptive Segmentation for Subevent Detection [74.48201657623218]
We propose an approach to learning and enforcing constraints that capture dependencies between subevent detection and EventSeg prediction.
We adopt Rectifier Networks for constraint learning and then convert the learned constraints to a regularization term in the loss function of the neural model.
arXiv Detail & Related papers (2021-09-13T20:50:37Z) - Bridging the Gap between Events and Frames through Unsupervised Domain
Adaptation [57.22705137545853]
We propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data.
We leverage the generative event model to split event features into content and motion features.
Our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks.
arXiv Detail & Related papers (2021-09-06T17:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.