In-Context Learning for Label-Efficient Cancer Image Classification in Oncology
- URL: http://arxiv.org/abs/2505.08798v1
- Date: Thu, 08 May 2025 20:49:01 GMT
- Title: In-Context Learning for Label-Efficient Cancer Image Classification in Oncology
- Authors: Mobina Shrestha, Bishwas Mandal, Vishal Mandal, Asis Shrestha,
- Abstract summary: In-context learning (ICL) is a pragmatic alternative to model retraining for domain-specific diagnostic tasks.<n>We evaluated the performance of four vision-language models (VLMs)-Paligemma, CLIP, ALIGN and GPT-4o.<n>ICL demonstrated competitive gains despite their smaller size, suggesting feasibility for deployment in computing constrained clinical environments.
- Score: 1.741659712094955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The application of AI in oncology has been limited by its reliance on large, annotated datasets and the need for retraining models for domain-specific diagnostic tasks. Taking heed of these limitations, we investigated in-context learning as a pragmatic alternative to model retraining by allowing models to adapt to new diagnostic tasks using only a few labeled examples at inference, without the need for retraining. Using four vision-language models (VLMs)-Paligemma, CLIP, ALIGN and GPT-4o, we evaluated the performance across three oncology datasets: MHIST, PatchCamelyon and HAM10000. To the best of our knowledge, this is the first study to compare the performance of multiple VLMs on different oncology classification tasks. Without any parameter updates, all models showed significant gains with few-shot prompting, with GPT-4o reaching an F1 score of 0.81 in binary classification and 0.60 in multi-class classification settings. While these results remain below the ceiling of fully fine-tuned systems, they highlight the potential of ICL to approximate task-specific behavior using only a handful of examples, reflecting how clinicians often reason from prior cases. Notably, open-source models like Paligemma and CLIP demonstrated competitive gains despite their smaller size, suggesting feasibility for deployment in computing constrained clinical environments. Overall, these findings highlight the potential of ICL as a practical solution in oncology, particularly for rare cancers and resource-limited contexts where fine-tuning is infeasible and annotated data is difficult to obtain.
Related papers
- Benchmarking Foundation Models for Mitotic Figure Classification [0.37334049820361814]
Self-supervised learning techniques have enabled the use of vast amounts of unlabeled data to train large-scale neural networks.<n>In this work, we investigate the use of foundation models for mitotic figure classification.<n>We compare all models against end-to-end-trained baselines, both CNNs and Vision Transformers.
arXiv Detail & Related papers (2025-08-06T13:30:40Z) - Interpreting Biomedical VLMs on High-Imbalance Out-of-Distributions: An Insight into BiomedCLIP on Radiology [0.0]
We analyse the limitations of BiomedCLIP when applied to a highly imbalanced, out-of-distribution medical dataset.<n>We show that the model under zero-shot settings over-predicts all labels, leading to poor precision and inter-class separability.<n>We highlight the need for careful adaptations of the models to foster reliability and applicability in a real-world setting.
arXiv Detail & Related papers (2025-06-17T02:59:42Z) - Exploring Foundation Models Fine-Tuning for Cytology Classification [0.10555513406636088]
We show how existing foundation models can be applied to cytological classification.
We evaluate five foundation models across four cytological classification datasets.
Our results demonstrate that fine-tuning the pre-trained backbones with LoRA significantly improves model performance.
arXiv Detail & Related papers (2024-11-22T14:34:04Z) - Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective [32.93871326428446]
Recent advances in artificial intelligence (AI) are revolutionizing medical imaging and computational pathology.<n>A constant challenge in the analysis of digital Whole Slide Images (WSIs) is the problem of aggregating tens of thousands of tile-level image embeddings to a slide-level representation.<n>This study conducts a benchmarking analysis of ten slide-level aggregation techniques across nine clinically relevant tasks.
arXiv Detail & Related papers (2024-07-10T17:00:57Z) - Fairness Evolution in Continual Learning for Medical Imaging [47.52603262576663]
We study the behavior of Continual Learning (CL) strategies in medical imaging regarding classification performance.
We evaluate the Replay, Learning without Forgetting (LwF), LwF, and Pseudo-Label strategies.
LwF and Pseudo-Label exhibit optimal classification performance, but when including fairness metrics in the evaluation, it is clear that Pseudo-Label is less biased.
arXiv Detail & Related papers (2024-04-10T09:48:52Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - A comparative study of zero-shot inference with large language models
and supervised modeling in breast cancer pathology classification [1.4715634464004446]
Large language models (LLMs) have demonstrated promising transfer learning capability.
LLMs demonstrated the potential to speed up the execution of clinical NLP studies by reducing the need for curating large annotated datasets.
This may result in an increase in the utilization of NLP-based variables and outcomes in observational clinical studies.
arXiv Detail & Related papers (2024-01-25T02:05:31Z) - Graph-Ensemble Learning Model for Multi-label Skin Lesion Classification
using Dermoscopy and Clinical Images [7.159532626507458]
This study introduces a Graph Convolution Network (GCN) to exploit prior co-occurrence between each category as a correlation matrix into the deep learning model for the multi-label classification.
We propose a Graph-Ensemble Learning Model (GELN) that views the prediction from GCN as complementary information of the predictions from the fusion model.
arXiv Detail & Related papers (2023-07-04T13:19:57Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection.
Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch.
Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - CHEER: Rich Model Helps Poor Model via Knowledge Infusion [69.23072792708263]
We develop a knowledge infusion framework named CHEER that can succinctly summarize such rich model into transferable representations.
Our empirical results showed that CHEER outperformed baselines by 5.60% to 46.80% in terms of the macro-F1 score on multiple physiological datasets.
arXiv Detail & Related papers (2020-05-21T21:44:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.