Related papers: Prompting without Panic: Attribute-aware, Zero-shot, Test-Time Calibration

Prompting without Panic: Attribute-aware, Zero-shot, Test-Time Calibration

URL: http://arxiv.org/abs/2506.22819v1
Date: Sat, 28 Jun 2025 08:57:57 GMT
Title: Prompting without Panic: Attribute-aware, Zero-shot, Test-Time Calibration
Authors: Ramya Hebbalaguppe, Tamoghno Kandar, Abhinav Nagpal, Chetan Arora,
Abstract summary: We show that our approach can effectively improve the calibration after test-time prompt tuning (TPT)<n>We report an average expected calibration error (ECE) of 4.11 with our method, TCA, compared to 11.7 for vanilla TPT, 6.12 for C-TPT, 6.78 for DiffTPT,CVPR'23, and 8.43 for PromptAlign.
Score: 7.507012900046326
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-language models (VLM) have demonstrated impressive performance in image recognition by leveraging self-supervised training on large datasets. Their performance can be further improved by adapting to the test sample using test-time prompt tuning (TPT). Unfortunately, the singular focus of TPT approaches on improving the accuracy suffers from tunnel vision, and leads to degradation in confidence calibration. This limits the applicability of TPT in critical applications. We make three contributions in this work. (1) We posit that random or naive initialization of prompts leads to overfitting on a particular test sample, and is the main reason for miscalibration of the VLM after TPT. To mitigate the problem, we propose careful initialization of test time prompt using prior knowledge about the target label attributes from a large language model (LLM); (2) To further maintain the quality of prompts during \tpt, we propose a novel regularization loss to reduce intraclass distance, and increase inter-class distance between the learnt Through extensive experiments on different CLIP architectures and 15 datasets, we show that our approach can effectively improve the calibration after TPT. We report an average expected calibration error (ECE) of 4.11 with our method, TCA, compared to 11.7 for vanilla TPT, 6.12 for C-TPT (ICLR'24), 6.78 for DiffTPT (CVPR'23), and 8.43 for PromptAlign (NeurIPS'23). The code is publicly accessible at: https://github.com/rhebbalaguppe/TCA_PromptWithoutPanic.

Related papers

O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models [17.56932003351322]
Test-time prompt tuning for vision-language models (VLMs) is getting attention because of their ability to learn with unlabeled data without fine-tuning.<n>The resulting models tend to demonstrate poor calibration, which casts doubts on the reliability and trustworthiness of these models.<n>We propose a new approach, called O-TPT, that introduces orthogonality constraints on the textual features corresponding to the learnable prompts.
arXiv Detail & Related papers (2025-03-15T11:45:54Z)
Test-time Loss Landscape Adaptation for Zero-Shot Generalization in Vision-Language Models [3.1099372412393524]
This paper unveils the unnecessary nature of backpropagation in existing methods from a loss landscape perspective.<n>It proposes a simple yet effective framework called Test-time Loss Landscape Adaptation (TLLA)<n>In the prompt tuning stage, a Sharpness-Aware Prompt Tuning (SAPT) method is introduced to identify the training flat minimum.<n>In the test stage, a Sharpness-based Test Sample Selection (STSS) approach is utilized to ensure the alignment of flat minima.
arXiv Detail & Related papers (2025-01-31T03:10:48Z)
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models [53.91006249339802]
We propose a novel defense method called Test-Time Adversarial Prompt Tuning (TAPT) to enhance the inference robustness of CLIP against visual adversarial attacks. TAPT is a test-time defense method that learns defensive bimodal (textual and visual) prompts to robustify the inference process of CLIP. We evaluate the effectiveness of TAPT on 11 benchmark datasets, including ImageNet and 10 other zero-shot datasets.
arXiv Detail & Related papers (2024-11-20T08:58:59Z)
Efficient Test-Time Prompt Tuning for Vision-Language Models [41.90997623029582]
Self-TPT is a framework leveraging Self-supervised learning for efficient Test-time Prompt Tuning. We show that Self-TPT not only significantly reduces inference costs but also achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-11T13:55:58Z)
C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion [54.81141583427542]
In deep learning, test-time adaptation has gained attention as a method for model fine-tuning without the need for labeled data. This paper explores calibration during test-time prompt tuning by leveraging the inherent properties of CLIP. We present a novel method, Calibrated Test-time Prompt Tuning (C-TPT), for optimizing prompts during test-time with enhanced calibration.
arXiv Detail & Related papers (2024-03-21T04:08:29Z)
Revisiting the Power of Prompt for Visual Tuning [50.11465784194896]
This study explores the correlation evolvement between prompts and patch tokens during proficient training. Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes. Our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%.
arXiv Detail & Related papers (2024-02-04T07:49:02Z)
Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning [73.75282761503581]
We propose DiffTPT, which leverages pre-trained diffusion models to generate diverse and informative new data. Our experiments on test datasets with distribution shifts and unseen categories demonstrate that DiffTPT improves the zero-shot accuracy by an average of 5.13%.
arXiv Detail & Related papers (2023-08-11T09:36:31Z)
Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks. We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z)
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample. TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average. In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.