Related papers: Effortless Vision-Language Model Specialization in Histopathology without Annotation

Effortless Vision-Language Model Specialization in Histopathology without Annotation

URL: http://arxiv.org/abs/2508.07835v1
Date: Mon, 11 Aug 2025 10:39:27 GMT
Title: Effortless Vision-Language Model Specialization in Histopathology without Annotation
Authors: Jingna Qiu, Nishanth Jain, Jonas Ammeling, Marc Aubreville, Katharina Breininger,
Abstract summary: Vision-Language Models (VLMs) have demonstrated impressive zero-shot classification capabilities across various tasks.<n>Their general-purpose design may lead to suboptimal performance in specific downstream applications.<n>This paper investigates annotation-free adaptation of VLMs through continued pretraining on domain- and task-relevant image-caption pairs.
Score: 0.4154350202907906
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in Vision-Language Models (VLMs) in histopathology, such as CONCH and QuiltNet, have demonstrated impressive zero-shot classification capabilities across various tasks. However, their general-purpose design may lead to suboptimal performance in specific downstream applications. While supervised fine-tuning methods address this issue, they require manually labeled samples for adaptation. This paper investigates annotation-free adaptation of VLMs through continued pretraining on domain- and task-relevant image-caption pairs extracted from existing databases. Our experiments on two VLMs, CONCH and QuiltNet, across three downstream tasks reveal that these pairs substantially enhance both zero-shot and few-shot performance. Notably, with larger training sizes, continued pretraining matches the performance of few-shot methods while eliminating manual labeling. Its effectiveness, task-agnostic design, and annotation-free workflow make it a promising pathway for adapting VLMs to new histopathology tasks. Code is available at https://github.com/DeepMicroscopy/Annotation-free-VLM-specialization.

Related papers

Route-and-Execute: Auditable Model-Card Matching and Specialty-Level Deployment [6.7202991099968346]
We present a framework that uses a single vision-language model (VLM) in two complementary roles.<n>First, the VLM acts as an aware model-card matcher that routes an incoming image to the appropriate specialist model.<n>Second, we fine-tune the VLM on specialty-specific datasets ensuring a single model covers multiple downstream tasks.
arXiv Detail & Related papers (2025-08-22T23:34:37Z)
Adapting Vision-Language Models Without Labels: A Comprehensive Survey [74.17944178027015]
Vision-Language Models (VLMs) have demonstrated remarkable generalization capabilities across a wide range of tasks.<n>Recent research has increasingly focused on unsupervised adaptation methods that do not rely on labeled data.<n>We propose a taxonomy based on the availability and nature of unlabeled visual data, categorizing existing approaches into four key paradigms.
arXiv Detail & Related papers (2025-08-07T16:27:37Z)
Enhancing Input-Label Mapping in In-Context Learning with Contrastive Decoding [71.01099784480597]
Large language models (LLMs) excel at a range of tasks through in-context learning (ICL)<n>We introduce In-Context Contrastive Decoding (ICCD), a novel method that emphasizes input-label mapping.
arXiv Detail & Related papers (2025-02-19T14:04:46Z)
Learning to Rank Pre-trained Vision-Language Models for Downstream Tasks [41.488394198111976]
Vision language models (VLMs) like CLIP show stellar zero-shot capability on classification benchmarks.<n> selecting the VLM with the highest performance on the unlabeled downstream task is non-trivial.<n>This paper introduces the problem of textbfunsupervised vision-language model selection, where only unsupervised downstream datasets are available.
arXiv Detail & Related papers (2024-12-30T03:26:53Z)
Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model [41.55165760439727]
Vision-language models (VLMs) have revolutionized machine learning by leveraging large pre-trained models to tackle various downstream tasks.<n>We propose a graph-based approach for label-efficient adaptation and inference.<n>Our method dynamically constructs a graph over text prompts, few-shot examples, and test samples, using label propagation for inference without task-specific tuning.
arXiv Detail & Related papers (2024-12-24T09:15:00Z)
Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.<n>We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.<n>We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)
MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection [107.15164718585666]
We investigate the root cause of VLMs' biased prediction under the open vocabulary detection context. Our observations lead to a simple yet effective paradigm, coded MarvelOVD, that generates significantly better training targets. Our method outperforms the other state-of-the-arts by significant margins.
arXiv Detail & Related papers (2024-07-31T09:23:57Z)
VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Annotation-Free Pathological Image Classification [16.05109192966549]
We present a novel human annotation-free method by leveraging pre-trained Vision-Language Models (VLMs)<n>We introduce VLM-CPL, a novel approach that contains two noisy label filtering techniques with a semi-supervised learning strategy.<n> Experimental results on five public pathological image datasets for patch-level and slide-level classification showed that our method substantially outperformed zero-shot classification by VLMs.
arXiv Detail & Related papers (2024-03-23T13:24:30Z)
Enhancing Vision-Language Few-Shot Adaptation with Negative Learning [11.545127156146368]
We propose a Simple yet effective Negative Learning approach, SimNL, to more efficiently exploit task-specific knowledge. To this issue, we introduce a plug-and-play few-shot instance reweighting technique to mitigate noisy outliers. Our extensive experimental results validate that the proposed SimNL outperforms existing state-of-the-art methods on both few-shot learning and domain generalization tasks.
arXiv Detail & Related papers (2024-03-19T17:59:39Z)
Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection [87.39089806069707]
We propose a fine-grained Visual-Text Prompt-driven self-training paradigm for Open-Vocabulary Detection (VTP-OVD) During the adapting stage, we enable VLM to obtain fine-grained alignment by using learnable text prompts to resolve an auxiliary dense pixel-wise prediction task. Experiments show that our method achieves the state-of-the-art performance for open-vocabulary object detection, e.g., 31.5% mAP on unseen classes of COCO.
arXiv Detail & Related papers (2022-11-02T03:38:02Z)
CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models [59.49705076369856]
We introduce a novel framework to improve the fine-tuning phase of pre-trained language models (PLMs) We retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to a task. We then perform contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances to help PLMs capture crucial task-related semantic features.
arXiv Detail & Related papers (2021-02-07T09:27:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.