Interventional Contrastive Learning with Meta Semantic Regularizer
- URL: http://arxiv.org/abs/2206.14702v1
- Date: Wed, 29 Jun 2022 15:02:38 GMT
- Title: Interventional Contrastive Learning with Meta Semantic Regularizer
- Authors: Wenwen Qiang, Jiangmeng Li, Changwen Zheng, Bing Su, Hui Xiong
- Abstract summary: Contrastive learning (CL)-based self-supervised learning models learn visual representations in a pairwise manner.
When the CL model is trained with full images, the performance tested in full images is better than that in foreground areas.
When the CL model is trained with foreground areas, the performance tested in full images is worse than that in foreground areas.
- Score: 28.708395209321846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning (CL)-based self-supervised learning models learn visual
representations in a pairwise manner. Although the prevailing CL model has
achieved great progress, in this paper, we uncover an ever-overlooked
phenomenon: When the CL model is trained with full images, the performance
tested in full images is better than that in foreground areas; when the CL
model is trained with foreground areas, the performance tested in full images
is worse than that in foreground areas. This observation reveals that
backgrounds in images may interfere with the model learning semantic
information and their influence has not been fully eliminated. To tackle this
issue, we build a Structural Causal Model (SCM) to model the background as a
confounder. We propose a backdoor adjustment-based regularization method,
namely Interventional Contrastive Learning with Meta Semantic Regularizer
(ICL-MSR), to perform causal intervention towards the proposed SCM. ICL-MSR can
be incorporated into any existing CL methods to alleviate background
distractions from representation learning. Theoretically, we prove that ICL-MSR
achieves a tighter error bound. Empirically, our experiments on multiple
benchmark datasets demonstrate that ICL-MSR is able to improve the performances
of different state-of-the-art CL methods.
Related papers
- What Matters for In-Context Learning: A Balancing Act of Look-up and In-Weight Learning [42.8453045943264]
We show that conceptual repetitions in the data sequences are crucial for ICL.
We also show that the emergence of ICL depends on balancing the in-weight learning objective with the in-context solving ability.
arXiv Detail & Related papers (2025-01-09T09:45:05Z) - Are Conditional Latent Diffusion Models Effective for Image Restoration? [3.015770349327888]
CLDMs excel in capturing high-level semantic correlations, making them effective for tasks like text-to-image generation with spatial conditioning.
In IR, where the goal is to enhance image perceptual quality, these models face difficulty of modeling the relationship between degraded images and ground truth images.
Results reveal that despite the scaling advantages of CLDMs, they suffer from high distortion and semantic deviation, especially in cases with minimal degradation.
arXiv Detail & Related papers (2024-12-12T14:49:55Z) - A Theoretical Analysis of Self-Supervised Learning for Vision Transformers [66.08606211686339]
Masked autoencoders (MAE) and contrastive learning (CL) capture different types of representations.
We study the training dynamics of one-layer softmax-based vision transformers (ViTs) on both MAE and CL objectives.
arXiv Detail & Related papers (2024-03-04T17:24:03Z) - Data Poisoning for In-context Learning [49.77204165250528]
In-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks.
This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks.
We introduce ICLPoison, a specialized attacking framework conceived to exploit the learning mechanisms of ICL.
arXiv Detail & Related papers (2024-02-03T14:20:20Z) - In-context Learning and Gradient Descent Revisited [3.085927389171139]
We show that even untrained models achieve comparable ICL-GD similarity scores despite not exhibiting ICL.
Next, we explore a major discrepancy in the flow of information throughout the model between ICL and GD, which we term Layer Causality.
We propose a simple GD-based optimization procedure that respects layer causality, and show it improves similarity scores significantly.
arXiv Detail & Related papers (2023-11-13T21:42:38Z) - Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP [84.90129481336659]
We study transferrable representation learning underlying CLIP and demonstrate how features from different modalities get aligned.
Inspired by our analysis, we propose a new CLIP-type approach, which achieves better performance than CLIP and other state-of-the-art methods on benchmark datasets.
arXiv Detail & Related papers (2023-10-02T06:41:30Z) - Non-Contrastive Learning Meets Language-Image Pre-Training [145.6671909437841]
We study the validity of non-contrastive language-image pre-training (nCLIP)
We introduce xCLIP, a multi-tasking framework combining CLIP and nCLIP, and show that nCLIP aids CLIP in enhancing feature semantics.
arXiv Detail & Related papers (2022-10-17T17:57:46Z) - Learning Deep Representations via Contrastive Learning for Instance
Retrieval [11.736450745549792]
This paper makes the first attempt that tackles the problem using instance-discrimination based contrastive learning (CL)
In this work, we approach this problem by exploring the capability of deriving discriminative representations from pre-trained and fine-tuned CL models.
arXiv Detail & Related papers (2022-09-28T04:36:34Z) - Using Representation Expressiveness and Learnability to Evaluate
Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability.
CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means.
We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z) - ReCLIP: A Strong Zero-Shot Baseline for Referring Expression
Comprehension [114.85628613911713]
Large-scale pre-trained models are useful for image classification across domains.
We present ReCLIP, a simple but strong zero-shot baseline that repurposes CLIP, a state-of-the-art large-scale model, for ReC.
arXiv Detail & Related papers (2022-04-12T17:55:38Z) - A Comprehensive Empirical Study of Vision-Language Pre-trained Model for
Supervised Cross-Modal Retrieval [19.2650103482509]
Cross-Modal Retrieval (CMR) is an important research topic across multimodal computing and information retrieval.
We take CLIP as the current representative vision-language pre-trained model to conduct a comprehensive empirical study.
We propose a novel model CLIP4CMR that employs pre-trained CLIP as backbone network to perform supervised CMR.
arXiv Detail & Related papers (2022-01-08T06:00:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.