Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification
- URL: http://arxiv.org/abs/2512.21508v1
- Date: Thu, 25 Dec 2025 05:02:19 GMT
- Title: Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification
- Authors: Md Ashik Khan, Md Nahid Siddique,
- Abstract summary: Multimodal chest X-Ray analysis often fine-tunes large vision-language models, which is computationally costly.<n>We study parameter-efficient training strategies, including frozen encoders, BitFit, LoRA, and adapters for multi-label classification on the Indiana University Chest X-Ray dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal chest X-Ray analysis often fine-tunes large vision-language models, which is computationally costly. We study parameter-efficient training (PET) strategies, including frozen encoders, BitFit, LoRA, and adapters for multi-label classification on the Indiana University Chest X-Ray dataset (3,851 image-report pairs; 579 test samples). To mitigate data leakage, we redact pathology terms from reports used as text inputs while retaining clinical context. Under a fixed parameter budget (2.37M parameters, 2.51% of total), all PET variants achieve AUROC between 0.892 and 0.908, outperforming full fine-tuning (0.770 AUROC), which uses 94.3M trainable parameters, a 40x reduction. External validation on CheXpert (224,316 images, 58x larger) confirms scalability: all PET methods achieve >0.69 AUROC with <9% trainable parameters, with Adapter achieving best performance (0.7214 AUROC). Budget-matched comparisons reveal that vision-only models (0.653 AUROC, 1.06M parameters) outperform budget-matched multimodal models (0.641 AUROC, 1.06M parameters), indicating improvements arise primarily from parameter allocation rather than cross-modal synergy. While PET methods show degraded calibration (ECE: 0.29-0.34) compared to simpler models (ECE: 0.049), this represents a tractable limitation addressable through post-hoc calibration methods. These findings demonstrate that frozen encoder strategies provide superior discrimination at substantially reduced computational cost, though calibration correction is essential for clinical deployment.
Related papers
- Time-Series at the Edge: Tiny Separable CNNs for Wearable Gait Detection and Optimal Sensor Placement [3.7765281299298015]
We study on-device time-series analysis for gait detection in Parkinson's disease (PD) from short windows of triaxial acceleration, targeting resource-latency wearables and edge nodes.<n>We compare magnitude thresholding to three 1D CNNs for time-series analysis: a literature baseline (separable convolutions) and two ultra-light models - one purely separable and one with residual connections.
arXiv Detail & Related papers (2025-11-29T08:52:41Z) - Generalizable Diabetes Risk Stratification via Hybrid Machine Learning Models [0.0]
Diabetes affects over 537 million people worldwide and is projected to reach 783 million by 2045.<n>We compare two hybrid classifiers and assess their generalizability on an external cohort.
arXiv Detail & Related papers (2025-09-24T21:18:52Z) - Enhancement Without Contrast: Stability-Aware Multicenter Machine Learning for Glioma MRI Imaging [2.5012408467295555]
Predicting contrast enhancement from non-contrast MRI using machine learning (ML) offers a safer alternative to Gadolinium-based contrast agents (GBCAs)<n>We propose a stability-aware framework to identify reproducible ML pipelines for multicenter prediction of glioma MRI contrast enhancement.
arXiv Detail & Related papers (2025-09-13T00:47:07Z) - Fantastic Pretraining Optimizers and Where to Find Them [59.56075036649332]
AdamW has long been the dominant gradients in language model pretraining.<n>Speedup of matrix-based matrices is inversely proportional to model scale.
arXiv Detail & Related papers (2025-09-02T07:43:22Z) - Handcrafted vs. Deep Radiomics vs. Fusion vs. Deep Learning: A Comprehensive Review of Machine Learning -Based Cancer Outcome Prediction in PET and SPECT Imaging [2.3507313809321233]
This systematic review analyzed 226 studies published from 2020 to 2025 that applied machine learning to PET or SPECT imaging for outcome prediction.<n> PET-based studies generally outperformed those using SPECT, likely due to higher spatial resolution and sensitivity.<n>Common limitations included inadequate handling of class imbalance, missing data, and low population diversity.
arXiv Detail & Related papers (2025-07-21T21:03:12Z) - EfficientLLM: Efficiency in Large Language Models [64.3537131208038]
Large Language Models (LLMs) have driven significant progress, yet their growing counts and context windows incur prohibitive compute, energy, and monetary costs.<n>We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale.
arXiv Detail & Related papers (2025-05-20T02:27:08Z) - CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction [11.731590131260424]
CorBenchX is a suite for automated error detection and correction in chest X-ray reports.<n>We first synthesize a large-scale dataset of 26,326 chest X-ray error reports.<n>We benchmark both open- and closed-source vision-language models.
arXiv Detail & Related papers (2025-05-17T15:39:39Z) - Patch-Level Contrasting without Patch Correspondence for Accurate and
Dense Contrastive Representation Learning [79.43940012723539]
ADCLR is a self-supervised learning framework for learning accurate and dense vision representation.
Our approach achieves new state-of-the-art performance for contrastive methods.
arXiv Detail & Related papers (2023-06-23T07:38:09Z) - Attention-based Saliency Maps Improve Interpretability of Pneumothorax
Classification [52.77024349608834]
To investigate chest radiograph (CXR) classification performance of vision transformers (ViT) and interpretability of attention-based saliency.
ViTs were fine-tuned for lung disease classification using four public data sets: CheXpert, Chest X-Ray 14, MIMIC CXR, and VinBigData.
ViTs had comparable CXR classification AUCs compared with state-of-the-art CNNs.
arXiv Detail & Related papers (2023-03-03T12:05:41Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z) - Sparse Structure Search for Parameter-Efficient Tuning [85.49094523664428]
We show that S$3$PET surpasses manual and random structures with less trainable parameters.
The searched structures preserve more than 99% fine-tuning performance with 0.01% trainable parameters.
arXiv Detail & Related papers (2022-06-15T08:45:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.