Related papers: Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption

Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption

URL: http://arxiv.org/abs/2505.12912v1
Date: Mon, 19 May 2025 09:47:46 GMT
Title: Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption
Authors: Kazuki Adachi, Shin'ya Yamaguchi, Tomoki Hamagami,
Abstract summary: We find that vision-language models still suffer when they face datasets with large gaps from training ones.<n>We propose a novel method called information-balanced TTA (UnInfo) to make models robust to sensor degradation.
Score: 4.792851066169872
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-trained vision-language models such as contrastive language-image pre-training (CLIP) have demonstrated a remarkable generalizability, which has enabled a wide range of applications represented by zero-shot classification. However, vision-language models still suffer when they face datasets with large gaps from training ones, i.e., distribution shifts. We found that CLIP is especially vulnerable to sensor degradation, a type of realistic distribution shift caused by sensor conditions such as weather, light, or noise. Collecting a new dataset from a test distribution for fine-tuning highly costs since sensor degradation occurs unexpectedly and has a range of variety. Thus, we investigate test-time adaptation (TTA) of zero-shot classification, which enables on-the-fly adaptation to the test distribution with unlabeled test data. Existing TTA methods for CLIP mainly focus on modifying image and text embeddings or predictions to address distribution shifts. Although these methods can adapt to domain shifts, such as fine-grained labels spaces or different renditions in input images, they fail to adapt to distribution shifts caused by sensor degradation. We found that this is because image embeddings are "corrupted" in terms of uniformity, a measure related to the amount of information. To make models robust to sensor degradation, we propose a novel method called uniformity-aware information-balanced TTA (UnInfo). To address the corruption of image embeddings, we introduce uniformity-aware confidence maximization, information-aware loss balancing, and knowledge distillation from the exponential moving average (EMA) teacher. Through experiments, we demonstrate that our UnInfo improves accuracy under sensor degradation by retaining information in terms of uniformity.

Related papers

Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections [50.343419243749054]
Anomaly Detection (AD) involves identifying deviations from normal data distributions.<n>We propose a novel approach that conditions the prompts of the text encoder based on image context extracted from the vision encoder.<n>Our method achieves state-of-the-art performance, improving performance by 2% to 29% across different metrics on 14 datasets.
arXiv Detail & Related papers (2025-04-15T10:42:25Z)
A Bias-Free Training Paradigm for More General AI-generated Image Detection [15.421102443599773]
A well-designed forensic detector should detect generator specific artifacts rather than reflect data biases.<n>We propose B-Free, a bias-free training paradigm, where fake images are generated from real ones.<n>We show significant improvements in both generalization and robustness over state-of-the-art detectors.
arXiv Detail & Related papers (2024-12-23T15:54:32Z)
Diffusion Model Driven Test-Time Image Adaptation for Robust Skin Lesion Classification [24.08402880603475]
We propose a test-time image adaptation method to enhance the accuracy of the model on test data. We modify the target test images by projecting them back to the source domain using a diffusion model. Our method makes the robustness more robust across various corruptions, architectures, and data regimes.
arXiv Detail & Related papers (2024-05-18T13:28:51Z)
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods. We present a novel forgery-aware adaptive transformer approach, namely FatFormer. Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z)
Masked Images Are Counterfactual Samples for Robust Fine-tuning [77.82348472169335]
Fine-tuning deep learning models can lead to a trade-off between in-distribution (ID) performance and out-of-distribution (OOD) robustness. We propose a novel fine-tuning method, which uses masked images as counterfactual samples that help improve the robustness of the fine-tuning model.
arXiv Detail & Related papers (2023-03-06T11:51:28Z)
DDPM-CD: Denoising Diffusion Probabilistic Models as Feature Extractors for Change Detection [31.125812018296127]
We introduce a novel approach for change detection by pre-training a Deno Diffusionising Probabilistic Model (DDPM) DDPM learns the training data distribution by gradually converting training images into a Gaussian distribution using a Markov chain. During inference (i.e., sampling), they can generate a diverse set of samples closer to the training distribution. Experiments conducted on the LEVIR-CD, WHU-CD, DSIFN-CD, and CDD datasets demonstrate that the proposed DDPM-CD method significantly outperforms the existing change detection methods in terms of F1 score, I
arXiv Detail & Related papers (2022-06-23T17:58:29Z)
Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference. Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance. In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z)
Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution. We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z)
Dense Out-of-Distribution Detection by Robust Learning on Synthetic Negative Data [1.7474352892977458]
We show how to detect out-of-distribution anomalies in road-driving scenes and remote sensing imagery. We leverage a jointly trained normalizing flow due to coverage-oriented learning objective and the capability to generate samples at different resolutions. The resulting models set the new state of the art on benchmarks for out-of-distribution detection in road-driving scenes and remote sensing imagery.
arXiv Detail & Related papers (2021-12-23T20:35:10Z)
A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition [31.557715381838147]
It is well known that vision classification models suffer from poor calibration in the face of data distribution shifts. We propose Geometric Sensitivity Decomposition (GSD) which decomposes the norm of a sample feature embedding into an instance-dependent and an instance-independent component. Inspired by the decomposition, we analytically derive a simple extension to current softmax-linear models, which learns to disentangle the two components during training.
arXiv Detail & Related papers (2021-10-27T16:46:41Z)
Semi-Supervised Domain Adaptation with Prototypical Alignment and Consistency Learning [86.6929930921905]
This paper studies how much it can help address domain shifts if we further have a few target samples labeled. To explore the full potential of landmarks, we incorporate a prototypical alignment (PA) module which calculates a target prototype for each class from the landmarks. Specifically, we severely perturb the labeled images, making PA non-trivial to achieve and thus promoting model generalizability.
arXiv Detail & Related papers (2021-04-19T08:46:08Z)
Cross-Sensor Adversarial Domain Adaptation of Landsat-8 and Proba-V images for Cloud Detection [1.5828697880068703]
The number of Earth observation satellites carrying optical sensors with similar characteristics is constantly growing. Differences in retrieved radiances lead to significant drops in accuracy, which hampers knowledge and information sharing across sensors. We propose a domain adaptation to reduce the statistical differences between images of two satellite sensors in order to boost the performance of transfer learning models.
arXiv Detail & Related papers (2020-06-10T16:16:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.