UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
- URL: http://arxiv.org/abs/2209.13430v1
- Date: Tue, 27 Sep 2022 14:36:16 GMT
- Title: UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
- Authors: Janghyeon Lee, Jongsuk Kim, Hyounguk Shon, Bumsoo Kim, Seung Hwan Kim,
Honglak Lee, Junmo Kim
- Abstract summary: We propose UniCLIP, a Unified framework for Contrastive Language-Image Pre-training.
UniCLIP integrates the contrastive loss of both inter-domain pairs and intra-domain pairs into a single universal space.
UniCLIP outperforms previous vision-language pre-training methods on various single- and multi-modality downstream tasks.
- Score: 62.97551575508387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training vision-language models with contrastive objectives has shown
promising results that are both scalable to large uncurated datasets and
transferable to many downstream applications. Some following works have
targeted to improve data efficiency by adding self-supervision terms, but
inter-domain (image-text) contrastive loss and intra-domain (image-image)
contrastive loss are defined on individual spaces in those works, so many
feasible combinations of supervision are overlooked. To overcome this issue, we
propose UniCLIP, a Unified framework for Contrastive Language-Image
Pre-training. UniCLIP integrates the contrastive loss of both inter-domain
pairs and intra-domain pairs into a single universal space. The discrepancies
that occur when integrating contrastive loss between different domains are
resolved by the three key components of UniCLIP: (1) augmentation-aware feature
embedding, (2) MP-NCE loss, and (3) domain dependent similarity measure.
UniCLIP outperforms previous vision-language pre-training methods on various
single- and multi-modality downstream tasks. In our experiments, we show that
each component that comprises UniCLIP contributes well to the final
performance.
Related papers
- Joint semi-supervised and contrastive learning enables zero-shot domain-adaptation and multi-domain segmentation [1.5393913074555419]
SegCLR is a versatile framework designed to segment volumetric images across different domains.
We demonstrate the superior performance of SegCLR through a comprehensive evaluation.
arXiv Detail & Related papers (2024-05-08T18:10:59Z) - RankCLIP: Ranking-Consistent Language-Image Pretraining [7.92247304974314]
RANKCLIP is a novel pretraining method that extends beyond the rigid one-to-one matching framework of CLIP.
By extending the traditional pair-wise loss to list-wise, RANKCLIP improves the alignment process, enabling it to capture the nuanced many-to-many relationships between and within each modality.
arXiv Detail & Related papers (2024-04-15T00:12:27Z) - Split to Merge: Unifying Separated Modalities for Unsupervised Domain
Adaptation [25.499205902426716]
We introduce a Unified Modality Separation (UniMoS) framework for unsupervised domain adaptation.
We craft a nimble modality separation network that distinctly disentangles CLIP's features into language-associated and vision-associated components.
Our proposed Modality-Ensemble Training (MET) method fosters the exchange of modality-agnostic information.
arXiv Detail & Related papers (2024-03-11T17:33:12Z) - UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World
Understanding [93.45067274442881]
This paper extends Contrastive language-image pre-training (CLIP) with multi-granularity alignment.
We develop a unified multi-granularity learning framework, named UMG-CLIP, that simultaneously empowers the model with versatile perception abilities across different levels of detail.
arXiv Detail & Related papers (2024-01-12T06:35:09Z) - One-for-All: Towards Universal Domain Translation with a Single StyleGAN [86.33216867136639]
We propose a novel translation model, UniTranslator, for transforming representations between visually distinct domains.
The proposed UniTranslator is versatile and capable of performing various tasks, including style mixing, stylization, and translations.
UniTranslator surpasses the performance of existing general-purpose models and performs well against specialized models in representative tasks.
arXiv Detail & Related papers (2023-10-22T08:02:55Z) - Generalized Few-Shot Continual Learning with Contrastive Mixture of
Adapters [59.82088750033897]
We set up a Generalized FSCL (GFSCL) protocol involving both class- and domain-incremental situations.
We find that common continual learning methods have poor generalization ability on unseen domains.
In this way, we propose a rehearsal-free framework based on Vision Transformer (ViT) named Contrastive Mixture of Adapters (CMoA)
arXiv Detail & Related papers (2023-02-12T15:18:14Z) - Non-Contrastive Learning Meets Language-Image Pre-Training [145.6671909437841]
We study the validity of non-contrastive language-image pre-training (nCLIP)
We introduce xCLIP, a multi-tasking framework combining CLIP and nCLIP, and show that nCLIP aids CLIP in enhancing feature semantics.
arXiv Detail & Related papers (2022-10-17T17:57:46Z) - Learning Visual Representation from Modality-Shared Contrastive
Language-Image Pre-training [88.80694147730883]
We investigate a variety of Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP) frameworks.
In studied conditions, we observe that a mostly unified encoder for vision and language signals outperforms all other variations that separate more parameters.
Our approach outperforms vanilla CLIP by 1.6 points in linear probing on a collection of 24 downstream vision tasks.
arXiv Detail & Related papers (2022-07-26T05:19:16Z) - ProtoCLIP: Prototypical Contrastive Language Image Pretraining [12.067061175987075]
Prototypical Contrastive Language Image Pretraining (ProtoCLIP) is introduced to enhance such grouping.
ProtoCLIP sets up prototype-level discrimination between image and text spaces, which efficiently transfers higher-level structural knowledge.
ProtoCLIP is trained with an online episodic training strategy, which makes it can be scaled up to unlimited amounts of data.
arXiv Detail & Related papers (2022-06-22T11:55:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.