Vision Language Pre-training by Contrastive Learning with Cross-Modal
Similarity Regulation
- URL: http://arxiv.org/abs/2305.04474v3
- Date: Thu, 22 Jun 2023 06:44:57 GMT
- Title: Vision Language Pre-training by Contrastive Learning with Cross-Modal
Similarity Regulation
- Authors: Chaoya Jiang, Wei Ye, Haiyang Xu, Miang yan, Shikun Zhang, Jie Zhang,
Fei Huang
- Abstract summary: Cross-modal contrastive learning in vision language pretraining faces the challenge of (partial) false negatives.
We propose a contrastive learning strategy regulated by progressively refined cross-modal similarity, to more accurately optimize MI between an image/text anchor and its negative texts/images.
- Score: 44.851623239151124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-modal contrastive learning in vision language pretraining (VLP) faces
the challenge of (partial) false negatives. In this paper, we study this
problem from the perspective of Mutual Information (MI) optimization. It is
common sense that InfoNCE loss used in contrastive learning will maximize the
lower bound of MI between anchors and their positives, while we theoretically
prove that MI involving negatives also matters when noises commonly exist.
Guided by a more general lower bound form for optimization, we propose a
contrastive learning strategy regulated by progressively refined cross-modal
similarity, to more accurately optimize MI between an image/text anchor and its
negative texts/images instead of improperly minimizing it. Our method performs
competitively on four downstream cross-modal tasks and systematically balances
the beneficial and harmful effects of (partial) false negative samples under
theoretical guidance.
Related papers
- Improving Contrastive Learning of Sentence Embeddings with Focal-InfoNCE [13.494159547236425]
This study introduces an unsupervised contrastive learning framework that combines SimCSE with hard negative mining.
The proposed focal-InfoNCE function introduces self-paced modulation terms in the contrastive objective, downweighting the loss associated with easy negatives and encouraging the model focusing on hard negatives.
arXiv Detail & Related papers (2023-10-10T18:15:24Z) - Identical and Fraternal Twins: Fine-Grained Semantic Contrastive
Learning of Sentence Representations [6.265789210037749]
We introduce a novel Identical and Fraternal Twins of Contrastive Learning framework, capable of simultaneously adapting to various positive pairs generated by different augmentation techniques.
We also present proof-of-concept experiments combined with the contrastive objective to prove the validity of the proposed Twins Loss.
arXiv Detail & Related papers (2023-07-20T15:02:42Z) - Exploiting Pseudo Image Captions for Multimodal Summarization [26.033681302592207]
Cross-modal contrastive learning in vision language pretraining faces the challenge of (partial) false negatives.
We propose a contrastive learning strategy regulated by progressively refined cross-modal similarity, to more accurately optimize MI between an image/text anchor and its negative texts/images.
arXiv Detail & Related papers (2023-05-09T14:47:25Z) - Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype
Contrast [34.58856143210749]
We present an approach to learn voice-face representations from the talking face videos, without any identity labels.
Previous works employ cross-modal instance discrimination tasks to establish the correlation of voice and face.
We propose the cross-modal prototype contrastive learning (CMPC), which takes advantage of contrastive methods and resists adverse effects of false negatives and deviate positives.
arXiv Detail & Related papers (2022-04-28T07:28:56Z) - Robust Contrastive Learning against Noisy Views [79.71880076439297]
We propose a new contrastive loss function that is robust against noisy views.
We show that our approach provides consistent improvements over the state-of-the-art image, video, and graph contrastive learning benchmarks.
arXiv Detail & Related papers (2022-01-12T05:24:29Z) - Max-Margin Contrastive Learning [120.32963353348674]
We present max-margin contrastive learning (MMCL) for unsupervised representation learning.
Our approach selects negatives as the sparse support vectors obtained via a quadratic optimization problem.
We validate our approach on standard vision benchmark datasets, demonstrating better performance in unsupervised representation learning.
arXiv Detail & Related papers (2021-12-21T18:56:54Z) - Revisiting Contrastive Learning through the Lens of Neighborhood
Component Analysis: an Integrated Framework [70.84906094606072]
We show a new methodology to design integrated contrastive losses that could simultaneously achieve good accuracy and robustness on downstream tasks.
With the integrated framework, we achieve up to 6% improvement on the standard accuracy and 17% improvement on the adversarial accuracy.
arXiv Detail & Related papers (2021-12-08T18:54:11Z) - Incremental False Negative Detection for Contrastive Learning [95.68120675114878]
We introduce a novel incremental false negative detection for self-supervised contrastive learning.
During contrastive learning, we discuss two strategies to explicitly remove the detected false negatives.
Our proposed method outperforms other self-supervised contrastive learning frameworks on multiple benchmarks within a limited compute.
arXiv Detail & Related papers (2021-06-07T15:29:14Z) - Solving Inefficiency of Self-supervised Representation Learning [87.30876679780532]
Existing contrastive learning methods suffer from very low learning efficiency.
Under-clustering and over-clustering problems are major obstacles to learning efficiency.
We propose a novel self-supervised learning framework using a median triplet loss.
arXiv Detail & Related papers (2021-04-18T07:47:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.