Related papers: Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation

Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation

URL: http://arxiv.org/abs/2411.17002v1
Date: Tue, 26 Nov 2024 00:15:37 GMT
Title: Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation
Authors: Shambhavi Mishra, Julio Silva-Rodrıguez, Ismail Ben Ayed, Marco Pedersoli, Jose Dolz,
Abstract summary: We show how to leverage class text information to mitigate distribution drifts encountered by vision-language models (VLMs) during test-time inference. We propose to generate pseudo-labels for the test-time samples by exploiting generic class text embeddings as fixed centroids of a label assignment problem. Experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of CLIP-OT.
Score: 21.20806568508201
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-language foundation models, such as CLIP, have shown unprecedented zero-shot performance across a wide range of tasks. Nevertheless, these models may be unreliable under distributional shifts, as their performance is significantly degraded. In this work, we explore how to efficiently leverage class text information to mitigate these distribution drifts encountered by large pre-trained vision-language models (VLMs) during test-time inference. In particular, we propose to generate pseudo-labels for the test-time samples by exploiting generic class text embeddings as fixed centroids of a label assignment problem, which is efficiently solved with Optimal Transport. Furthermore, the proposed adaptation method (CLIP-OT) integrates a multiple template knowledge distillation approach, which replicates multi-view contrastive learning strategies in unsupervised representation learning but without incurring additional computational complexity. Extensive experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of CLIP-OT, achieving performance gains of up to 7% over recent state-of-the-art methods, yet being computationally and memory efficient.

Related papers

Unleashing the Power of Vision-Language Models for Long-Tailed Multi-Label Visual Recognition [55.189113121465816]
We propose a novel correlation adaptation prompt network (CAPNET) for long-tailed multi-label visual recognition.<n>CAPNET explicitly models correlations from CLIP's textual encoder.<n>It improves generalization through test-time ensembling and realigns visual-textual modalities.
arXiv Detail & Related papers (2025-11-25T18:57:28Z)
Training-Free Class Purification for Open-Vocabulary Semantic Segmentation [72.87707878910896]
FreeCP is a training-free class purification framework for semantic segmentation.<n>We conduct experiments across eight benchmarks to validate FreeCP's effectiveness.<n>Results demonstrate that FreeCP, as a plug-and-play module, significantly boosts segmentation performance when combined with other OVSS methods.
arXiv Detail & Related papers (2025-08-01T11:55:12Z)
Salvaging the Overlooked: Leveraging Class-Aware Contrastive Learning for Multi-Class Anomaly Detection [18.797864512898787]
In anomaly detection, early approaches often train separate models for individual classes, yielding high performance but posing challenges in scalability and resource management.<n>We investigate this performance observed in reconstruction-based methods, identifying the key issue: inter-class confusion.<n>This confusion emerges when a model trained in multi-class scenarios incorrectly reconstructs samples from one class as those of another, thereby exacerbating reconstruction errors.<n>By explicitly leveraging raw object category information (eg carpet or wood), we introduce local CL to refine multiscale dense features, and global CL to obtain more compact feature representations of normal patterns, thereby effectively adapting the models to multi-class
arXiv Detail & Related papers (2024-12-06T04:31:09Z)
Active Prompt Learning with Vision-Language Model Priors [9.173468790066956]
We introduce a class-guided clustering that leverages the pre-trained image and text encoders of vision-language models. We propose a budget-saving selective querying based on adaptive class-wise thresholds.
arXiv Detail & Related papers (2024-11-23T02:34:33Z)
In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z)
BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models [20.88680592729709]
We propose a novel backpropagation-free algorithm BaFTA for test-time adaptation of vision-language models. BaFTA directly estimates class centroids using online clustering within a projected embedding space. We demonstrate that BaFTA consistently outperforms state-of-the-art test-time adaptation methods in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-06-17T08:16:24Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications. The quality of these exemplars in the prompt greatly impacts performance. Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z)
In-context Prompt Learning for Test-time Vision Recognition with Frozen Vision-language Model [13.983810804606264]
We propose In-Context Prompt Learning (InCPL) for test-time visual recognition tasks. InCPL associates a new test sample with very few labeled examples as context information. We introduce a context-aware unsupervised loss to optimize visual prompts tailored to test samples.
arXiv Detail & Related papers (2024-03-10T08:15:51Z)
Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization [101.08992036691673]
This paper explores a realistic unsupervised fine-tuning scenario, considering the presence of out-of-distribution samples from unknown classes. In particular, we focus on simultaneously enhancing out-of-distribution detection and the recognition of instances associated with known classes. We present a simple, efficient, and effective approach called Universal Entropy Optimization (UEO)
arXiv Detail & Related papers (2023-08-24T16:47:17Z)
Active Learning Principles for In-Context Learning with Large Language Models [65.09970281795769]
This paper investigates how Active Learning algorithms can serve as effective demonstration selection methods for in-context learning. We show that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
arXiv Detail & Related papers (2023-05-23T17:16:04Z)
Learning New Tasks from a Few Examples with Soft-Label Prototypes [18.363177410917597]
We propose a novel few-shot learning approach based on soft-label prototypes (SLPs) We focus on learning previously unseen NLP tasks from very few examples (4, 8, 16) per class. We experimentally demonstrate that our approach achieves superior performance on the majority of tested tasks in this data-lean setting.
arXiv Detail & Related papers (2022-10-31T16:06:48Z)
Transductive Few-Shot Learning: Clustering is All You Need? [31.21306826132773]
We investigate a general formulation for transive few-shot learning, which integrates prototype-based objectives. We find that our method yields competitive performances, in term of accuracy and optimization, while scaling up to large problems. Surprisingly, we find that our general model already achieve competitive performances in comparison to the state-of-the-art learning.
arXiv Detail & Related papers (2021-06-16T16:14:01Z)
Contrastive Learning with Adversarial Examples [79.39156814887133]
Contrastive learning (CL) is a popular technique for self-supervised learning (SSL) of visual representations. This paper introduces a new family of adversarial examples for constrastive learning and using these examples to define a new adversarial training algorithm for SSL, denoted as CLAE.
arXiv Detail & Related papers (2020-10-22T20:45:10Z)
Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method. PCL implicitly encodes semantic structures of the data into the learned embedding space. PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.