Related papers: Mint: A Simple Test-Time Adaptation of Vision-Language Models against Common Corruptions

Mint: A Simple Test-Time Adaptation of Vision-Language Models against Common Corruptions

URL: http://arxiv.org/abs/2510.22127v1
Date: Sat, 25 Oct 2025 02:55:08 GMT
Title: Mint: A Simple Test-Time Adaptation of Vision-Language Models against Common Corruptions
Authors: Wenxuan Bao, Ruxi Deng, Jingrui He,
Abstract summary: We investigate how corruptions affect CLIP's image embeddings and uncover a consistent phenomenon we term as embedding variance collapse.<n>We find that this collapse is closely tied to performance degradation, with inter-class variance strongly correlated with classification accuracy.<n>We propose Mint, a simple test-time adaptation method that maximizes pseudo-label-based inter-class variance on the fly.
Score: 44.25678062208464
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pretrained vision-language models such as CLIP achieve strong zero-shot generalization but remain vulnerable to distribution shifts caused by input corruptions. In this work, we investigate how corruptions affect CLIP's image embeddings and uncover a consistent phenomenon we term as embedding variance collapse, where both intra-class and inter-class variances shrink as corruption severity increases. We find that this collapse is closely tied to performance degradation, with inter-class variance strongly correlated with classification accuracy. To explain this phenomenon, we analyze how corruptions alter the structure of the embedding space. Our theoretical results suggest that the visual encoder tends to encode corruption-related signals, which dilute class-discriminative features and compress the representation geometry. We further show that maximizing inter-class variance, even when estimated from pseudo-labels, can provably enhance embedding quality. Based on this insight, we propose Mint, a simple test-time adaptation method that maximizes pseudo-label-based inter-class variance on the fly using a mean accumulator and a gradient accumulator. Mint operates effectively with small batch sizes and consistently improves performance across multiple corruption benchmarks and CLIP architectures. Our code is available at https://github.com/baowenxuan/Mint .

Related papers

Beyond the Loss Curve: Scaling Laws, Active Learning, and the Limits of Learning from Exact Posteriors [8.410613979416203]
We use class-conditional normalizing flows as oracles that make exact posteriors tractable on realistic images.<n>Our framework reveals that standard metrics hide ongoing learning, mask architectural differences, and cannot diagnose the nature of distribution shift.
arXiv Detail & Related papers (2026-01-30T21:08:55Z)
Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption [4.792851066169872]
We find that vision-language models still suffer when they face datasets with large gaps from training ones.<n>We propose a novel method called information-balanced TTA (UnInfo) to make models robust to sensor degradation.
arXiv Detail & Related papers (2025-05-19T09:47:46Z)
Navigating Semantic Drift in Task-Agnostic Class-Incremental Learning [51.177789437682954]
Class-incremental learning (CIL) seeks to enable a model to sequentially learn new classes while retaining knowledge of previously learned ones.<n> Balancing flexibility and stability remains a significant challenge, particularly when the task ID is unknown.<n>We propose a novel semantic drift calibration method that incorporates mean shift compensation and covariance calibration.
arXiv Detail & Related papers (2025-02-11T13:57:30Z)
Technical report on label-informed logit redistribution for better domain generalization in low-shot classification with foundation models [3.938980910007962]
Confidence calibration is an emerging challenge in real-world decision systems based on foundations models.<n>We propose a penalty incorporated into loss objective that penalizes incorrect classifications whenever one is made during finetuning.<n>We refer to it as textitconfidence misalignment penalty (CMP).
arXiv Detail & Related papers (2025-01-29T11:54:37Z)
A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.<n>We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.<n> Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z)
Understanding the Detrimental Class-level Effects of Data Augmentation [63.1733767714073]
achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. We present a framework for understanding how DA interacts with class-level learning dynamics. We show that simple class-conditional augmentation strategies improve performance on the negatively affected classes.
arXiv Detail & Related papers (2023-12-07T18:37:43Z)
How does Contrastive Learning Organize Images? [8.077578967149561]
Contrastive learning, a dominant self-supervised technique, emphasizes similarity in representations between augmentations of the same input and dissimilarity for different ones. Recent studies challenge this direct relationship, spotlighting the crucial role of inductive biases. We introduce the "RLD (Relative Local Density)" metric to capture this discrepancy.
arXiv Detail & Related papers (2023-05-17T14:10:54Z)
Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class Incremental Learning [120.53458753007851]
Few-shot class-incremental learning (FSCIL) has been a challenging problem as only a few training samples are accessible for each novel class in the new sessions. We deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse. We propose a neural collapse inspired framework for FSCIL. Experiments on the miniImageNet, CUB-200, and CIFAR-100 datasets demonstrate that our proposed framework outperforms the state-of-the-art performances.
arXiv Detail & Related papers (2023-02-06T18:39:40Z)
Diverse Gaussian Noise Consistency Regularization for Robustness and Uncertainty Calibration [7.310043452300738]
Deep neural networks achieve high prediction accuracy when the train and test distributions coincide. Various types of corruptions occur which deviate from this setup and cause severe performance degradations. We propose a diverse Gaussian noise consistency regularization method for improving robustness of image classifiers under a variety of corruptions.
arXiv Detail & Related papers (2021-04-02T20:25:53Z)
On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness [78.6626755563546]
Several new data augmentations have been proposed that significantly improve performance on ImageNet-C. We develop a new measure in this space between augmentations and corruptions called the Minimal Sample Distance to demonstrate there is a strong correlation between similarity and performance. We observe a significant degradation in corruption robustness when the test-time corruptions are sampled to be perceptually dissimilar from ImageNet-C. Our results suggest that test error can be improved by training on perceptually similar augmentations, and data augmentations may not generalize well beyond the existing benchmark.
arXiv Detail & Related papers (2021-02-22T18:58:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.