Related papers: Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification

Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification

URL: http://arxiv.org/abs/2512.21508v1
Date: Thu, 25 Dec 2025 05:02:19 GMT
Title: Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification
Authors: Md Ashik Khan, Md Nahid Siddique,
Abstract summary: Multimodal chest X-Ray analysis often fine-tunes large vision-language models, which is computationally costly.<n>We study parameter-efficient training strategies, including frozen encoders, BitFit, LoRA, and adapters for multi-label classification on the Indiana University Chest X-Ray dataset.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal chest X-Ray analysis often fine-tunes large vision-language models, which is computationally costly. We study parameter-efficient training (PET) strategies, including frozen encoders, BitFit, LoRA, and adapters for multi-label classification on the Indiana University Chest X-Ray dataset (3,851 image-report pairs; 579 test samples). To mitigate data leakage, we redact pathology terms from reports used as text inputs while retaining clinical context. Under a fixed parameter budget (2.37M parameters, 2.51% of total), all PET variants achieve AUROC between 0.892 and 0.908, outperforming full fine-tuning (0.770 AUROC), which uses 94.3M trainable parameters, a 40x reduction. External validation on CheXpert (224,316 images, 58x larger) confirms scalability: all PET methods achieve >0.69 AUROC with <9% trainable parameters, with Adapter achieving best performance (0.7214 AUROC). Budget-matched comparisons reveal that vision-only models (0.653 AUROC, 1.06M parameters) outperform budget-matched multimodal models (0.641 AUROC, 1.06M parameters), indicating improvements arise primarily from parameter allocation rather than cross-modal synergy. While PET methods show degraded calibration (ECE: 0.29-0.34) compared to simpler models (ECE: 0.049), this represents a tractable limitation addressable through post-hoc calibration methods. These findings demonstrate that frozen encoder strategies provide superior discrimination at substantially reduced computational cost, though calibration correction is essential for clinical deployment.

Related papers

Time-Series at the Edge: Tiny Separable CNNs for Wearable Gait Detection and Optimal Sensor Placement [3.7765281299298015]
We study on-device time-series analysis for gait detection in Parkinson's disease (PD) from short windows of triaxial acceleration, targeting resource-latency wearables and edge nodes.<n>We compare magnitude thresholding to three 1D CNNs for time-series analysis: a literature baseline (separable convolutions) and two ultra-light models - one purely separable and one with residual connections.
arXiv Detail & Related papers (2025-11-29T08:52:41Z)
Generalizable Diabetes Risk Stratification via Hybrid Machine Learning Models [0.0]
Diabetes affects over 537 million people worldwide and is projected to reach 783 million by 2045.<n>We compare two hybrid classifiers and assess their generalizability on an external cohort.
arXiv Detail & Related papers (2025-09-24T21:18:52Z)
Enhancement Without Contrast: Stability-Aware Multicenter Machine Learning for Glioma MRI Imaging [2.5012408467295555]
Predicting contrast enhancement from non-contrast MRI using machine learning (ML) offers a safer alternative to Gadolinium-based contrast agents (GBCAs)<n>We propose a stability-aware framework to identify reproducible ML pipelines for multicenter prediction of glioma MRI contrast enhancement.
arXiv Detail & Related papers (2025-09-13T00:47:07Z)
Fantastic Pretraining Optimizers and Where to Find Them [59.56075036649332]
AdamW has long been the dominant gradients in language model pretraining.<n>Speedup of matrix-based matrices is inversely proportional to model scale.
arXiv Detail & Related papers (2025-09-02T07:43:22Z)
Handcrafted vs. Deep Radiomics vs. Fusion vs. Deep Learning: A Comprehensive Review of Machine Learning -Based Cancer Outcome Prediction in PET and SPECT Imaging [2.3507313809321233]
This systematic review analyzed 226 studies published from 2020 to 2025 that applied machine learning to PET or SPECT imaging for outcome prediction.<n> PET-based studies generally outperformed those using SPECT, likely due to higher spatial resolution and sensitivity.<n>Common limitations included inadequate handling of class imbalance, missing data, and low population diversity.
arXiv Detail & Related papers (2025-07-21T21:03:12Z)
EfficientLLM: Efficiency in Large Language Models [64.3537131208038]
Large Language Models (LLMs) have driven significant progress, yet their growing counts and context windows incur prohibitive compute, energy, and monetary costs.<n>We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale.
arXiv Detail & Related papers (2025-05-20T02:27:08Z)
CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction [11.731590131260424]
CorBenchX is a suite for automated error detection and correction in chest X-ray reports.<n>We first synthesize a large-scale dataset of 26,326 chest X-ray error reports.<n>We benchmark both open- and closed-source vision-language models.
arXiv Detail & Related papers (2025-05-17T15:39:39Z)
Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning [79.43940012723539]
ADCLR is a self-supervised learning framework for learning accurate and dense vision representation. Our approach achieves new state-of-the-art performance for contrastive methods.
arXiv Detail & Related papers (2023-06-23T07:38:09Z)
Attention-based Saliency Maps Improve Interpretability of Pneumothorax Classification [52.77024349608834]
To investigate chest radiograph (CXR) classification performance of vision transformers (ViT) and interpretability of attention-based saliency. ViTs were fine-tuned for lung disease classification using four public data sets: CheXpert, Chest X-Ray 14, MIMIC CXR, and VinBigData. ViTs had comparable CXR classification AUCs compared with state-of-the-art CNNs.
arXiv Detail & Related papers (2023-03-03T12:05:41Z)
Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing) We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z)
Sparse Structure Search for Parameter-Efficient Tuning [85.49094523664428]
We show that S$3$PET surpasses manual and random structures with less trainable parameters. The searched structures preserve more than 99% fine-tuning performance with 0.01% trainable parameters.
arXiv Detail & Related papers (2022-06-15T08:45:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.