Related papers: Few-Shot, Now for Real: Medical VLMs Adaptation without Balanced Sets or Validation

Few-Shot, Now for Real: Medical VLMs Adaptation without Balanced Sets or Validation

URL: http://arxiv.org/abs/2506.17500v1
Date: Fri, 20 Jun 2025 22:35:00 GMT
Title: Few-Shot, Now for Real: Medical VLMs Adaptation without Balanced Sets or Validation
Authors: Julio Silva-Rodríguez, Fereshteh Shakeri, Houda Bahig, Jose Dolz, Ismail Ben Ayed,
Abstract summary: Vision-language models (VLMs) are gaining attention in medical image analysis.<n>Previous works on this topic make strong assumptions about the distribution of adaptation data, which are unrealistic in the medical domain.<n>This work challenges these favorable deployment scenarios and introduces a realistic, imbalanced, validation-free adaptation setting.
Score: 17.875098424936542
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-language models (VLMs) are gaining attention in medical image analysis. These are pre-trained on large, heterogeneous data sources, yielding rich and transferable representations. Notably, the combination of modality-specialized VLMs with few-shot adaptation has provided fruitful results, enabling the efficient deployment of high-performing solutions. However, previous works on this topic make strong assumptions about the distribution of adaptation data, which are unrealistic in the medical domain. First, prior art assumes access to a balanced support set, a condition that breaks the natural imbalance in disease prevalence found in real-world scenarios. Second, these works typically assume the presence of an additional validation set to fix critical hyper-parameters, which is highly data-inefficient. This work challenges these favorable deployment scenarios and introduces a realistic, imbalanced, validation-free adaptation setting. Our extensive benchmark across various modalities and downstream tasks demonstrates that current methods systematically compromise their performance when operating under realistic conditions, occasionally even performing worse than zero-shot inference. Also, we introduce a training-free linear probe that adaptively blends visual and textual supervision. Detailed studies demonstrate that the proposed solver is a strong, efficient baseline, enabling robust adaptation in challenging scenarios.

Related papers

Trustworthy Few-Shot Transfer of Medical VLMs through Split Conformal Prediction [20.94974284175104]
Medical vision-language models (VLMs) have demonstrated unprecedented transfer capabilities and are being increasingly adopted for data-efficient image classification.<n>This work explores the split conformal prediction ( SCP) framework to provide trustworthiness guarantees when transferring such models.<n>We propose transductive split conformal adaptation (SCA-T), a novel pipeline for transfer learning on conformal scenarios.
arXiv Detail & Related papers (2025-06-20T22:48:07Z)
Full Conformal Adaptation of Medical Vision-Language Models [17.53651859360999]
Vision-language models (VLMs) pre-trained at large scale have shown unprecedented transferability capabilities.<n>This work investigates their behavior under the increasingly popular split conformal prediction framework.<n>We propose full conformal adaptation, a novel setting for jointly adapting and conformalizing pre-trained foundation models.
arXiv Detail & Related papers (2025-06-06T13:32:00Z)
Realistic Test-Time Adaptation of Vision-Language Models [23.972884634610413]
Vision-Language Models (VLMs) have been widely leveraged to improve predictive performance.<n>Previous works on transductive or test-time adaptation (TTA) often make strong assumptions about the data distribution.<n>Our work challenges these favorable deployment scenarios, and introduces a more realistic evaluation framework.
arXiv Detail & Related papers (2025-01-07T12:17:25Z)
SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities [55.87169702896249]
Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift.<n>We present a complete and fair evaluation of existing shallow algorithms, including reweighting, mapping, and subspace alignment.<n>Our benchmark highlights the importance of realistic validation and provides practical guidance for real-life applications.
arXiv Detail & Related papers (2024-07-16T12:52:29Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training [5.819704618007536]
A visual-language model (VLM) pre-trained on natural images and text pairs poses a significant barrier when applied to medical contexts. We propose a framework designed to adeptly tailor VLMs to the medical domain, employing selective sampling and hard-negative mining techniques.
arXiv Detail & Related papers (2024-05-30T04:04:36Z)
Source-Free Unsupervised Domain Adaptation with Hypothesis Consolidation of Prediction Rationale [53.152460508207184]
Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data. This paper proposes a novel approach that considers multiple prediction hypotheses for each sample and investigates the rationale behind each hypothesis. To achieve the optimal performance, we propose a three-step adaptation process: model pre-adaptation, hypothesis consolidation, and semi-supervised learning.
arXiv Detail & Related papers (2024-02-02T05:53:22Z)
Better Practices for Domain Adaptation [62.70267990659201]
Domain adaptation (DA) aims to provide frameworks for adapting models to deployment data without using labels. Unclear validation protocol for DA has led to bad practices in the literature. We show challenges across all three branches of domain adaptation methodology.
arXiv Detail & Related papers (2023-09-07T17:44:18Z)
Parameter-free Online Test-time Adaptation [19.279048049267388]
We show how test-time adaptation methods fare for a number of pre-trained models on a variety of real-world scenarios. We propose a particularly "conservative" approach, which addresses the problem with a Laplacian Adjusted Maximum Estimation (LAME) Our approach exhibits a much higher average accuracy across scenarios than existing methods, while being notably faster and have a much lower memory footprint.
arXiv Detail & Related papers (2022-01-15T00:29:16Z)
Supercharging Imbalanced Data Learning With Energy-based Contrastive Representation Transfer [72.5190560787569]
In computer vision, learning from long tailed datasets is a recurring theme, especially for natural image datasets. Our proposal posits a meta-distributional scenario, where the data generating mechanism is invariant across the label-conditional feature distributions. This allows us to leverage a causal data inflation procedure to enlarge the representation of minority classes.
arXiv Detail & Related papers (2020-11-25T00:13:11Z)
Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management. We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.