Test-Time Distribution Normalization for Contrastively Learned
Vision-language Models
- URL: http://arxiv.org/abs/2302.11084v2
- Date: Wed, 18 Oct 2023 23:06:06 GMT
- Title: Test-Time Distribution Normalization for Contrastively Learned
Vision-language Models
- Authors: Yifei Zhou, Juntao Ren, Fengyu Li, Ramin Zabih, Ser-Nam Lim
- Abstract summary: One of the most representative approaches proposed recently known as CLIP has garnered widespread adoption due to its effectiveness.
This paper reveals that the common downstream practice of taking a dot product is only a zeroth-order approximation of the optimization goal, resulting in a loss of information during test-time.
We propose Distribution Normalization (DN), where we approximate the mean representation of a batch of test samples and use such a mean to represent what would be analogous to negative samples in the InfoNCE loss.
- Score: 39.66329310098645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in the field of vision-language contrastive learning have made it
possible for many downstream applications to be carried out efficiently and
accurately by simply taking the dot product between image and text
representations. One of the most representative approaches proposed recently
known as CLIP has garnered widespread adoption due to its effectiveness. CLIP
is trained with an InfoNCE loss that takes into account both positive and
negative samples to help learn a much more robust representation space. This
paper reveals that the common downstream practice of taking a dot product is
only a zeroth-order approximation of the optimization goal, resulting in a loss
of information during test-time. Intuitively, since the model has been
optimized based on the InfoNCE loss, test-time procedures should also be in
alignment. The question lies in how one can retrieve any semblance of negative
samples information during inference in a computationally efficient way. To
this end, we propose Distribution Normalization (DN), where we approximate the
mean representation of a batch of test samples and use such a mean to represent
what would be analogous to negative samples in the InfoNCE loss. DN requires no
retraining or fine-tuning and can be effortlessly applied during inference.
Extensive experiments on a wide variety of downstream tasks exhibit a clear
advantage of DN over the dot product on top of other existing test-time
augmentation methods.
Related papers
- BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework.
We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples.
We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z) - DOTA: Distributional Test-Time Adaptation of Vision-Language Models [52.98590762456236]
Training-free test-time dynamic adapter (TDA) is a promising approach to address this issue.
We propose a simple yet effective method for DistributiOnal Test-time Adaptation (Dota)
Dota continually estimates the distributions of test samples, allowing the model to continually adapt to the deployment environment.
arXiv Detail & Related papers (2024-09-28T15:03:28Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - Rethinking Prototypical Contrastive Learning through Alignment,
Uniformity and Correlation [24.794022951873156]
We propose to learn Prototypical representation through Alignment, Uniformity and Correlation (PAUC)
Specifically, the ordinary ProtoNCE loss is revised with: (1) an alignment loss that pulls embeddings from positive prototypes together; (2) a loss that distributes the prototypical level features uniformly; (3) a correlation loss that increases the diversity and discriminability between prototypical level features.
arXiv Detail & Related papers (2022-10-18T22:33:12Z) - Siamese Prototypical Contrastive Learning [24.794022951873156]
Contrastive Self-supervised Learning (CSL) is a practical solution that learns meaningful visual representations from massive data in an unsupervised approach.
In this paper, we tackle this problem by introducing a simple but effective contrastive learning framework.
The key insight is to employ siamese-style metric loss to match intra-prototype features, while increasing the distance between inter-prototype features.
arXiv Detail & Related papers (2022-08-18T13:25:30Z) - Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.