Related papers: Does Bigger Mean Better? Comparitive Analysis of CNNs and Biomedical Vision Language Modles in Medical Diagnosis

Does Bigger Mean Better? Comparitive Analysis of CNNs and Biomedical Vision Language Modles in Medical Diagnosis

URL: http://arxiv.org/abs/2510.00411v2
Date: Thu, 02 Oct 2025 04:22:36 GMT
Title: Does Bigger Mean Better? Comparitive Analysis of CNNs and Biomedical Vision Language Modles in Medical Diagnosis
Authors: Ran Tong, Jiaqi Liu, Su Liu, Jiexi Xu, Lanruo Wang, Tong Wang,
Abstract summary: This paper presents a comparative analysis between a supervised lightweight Convolutional Neural Network (CNN) and a zero-shot medical Vision-Language Model (VLM)<n>Our experiments show that supervised CNNs serve as highly competitive baselines in both cases.<n>By optimizing the classification threshold on a validation set, the performance of BiomedCLIP is significantly boosted across both datasets.
Score: 7.41395379449452
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The accurate interpretation of chest radiographs using automated methods is a critical task in medical imaging. This paper presents a comparative analysis between a supervised lightweight Convolutional Neural Network (CNN) and a state-of-the-art, zero-shot medical Vision-Language Model (VLM), BiomedCLIP, across two distinct diagnostic tasks: pneumonia detection on the PneumoniaMNIST benchmark and tuberculosis detection on the Shenzhen TB dataset. Our experiments show that supervised CNNs serve as highly competitive baselines in both cases. While the default zero-shot performance of the VLM is lower, we demonstrate that its potential can be unlocked via a simple yet crucial remedy: decision threshold calibration. By optimizing the classification threshold on a validation set, the performance of BiomedCLIP is significantly boosted across both datasets. For pneumonia detection, calibration enables the zero-shot VLM to achieve a superior F1-score of 0.8841, surpassing the supervised CNN's 0.8803. For tuberculosis detection, calibration dramatically improves the F1-score from 0.4812 to 0.7684, bringing it close to the supervised baseline's 0.7834. This work highlights a key insight: proper calibration is essential for leveraging the full diagnostic power of zero-shot VLMs, enabling them to match or even outperform efficient, task-specific supervised models.

Related papers

Explainable Deep Learning for Pediatric Pneumonia Detection in Chest X-Ray Images [0.0]
Pneumonia remains a leading cause of morbidity and mortality among children worldwide.<n>This study compares two state-of-the-art convolutional neural network (CNN) architectures for automated pediatric pneumonia detection.
arXiv Detail & Related papers (2026-01-14T19:21:32Z)
Transparent Early ICU Mortality Prediction with Clinical Transformer and Per-Case Modality Attribution [42.85462513661566]
We present a lightweight, transparent multimodal ensemble that fuses physiological time-series measurements with unstructured clinical notes from the first 48 hours of an ICU stay.<n>A logistic regression model combines predictions from two modality-specific models: a bidirectional LSTM for vitals and a finetuned ClinicalModernBERT transformer for notes.<n>On the MIMIC-III benchmark, our late-fusion ensemble improves discrimination over the best single model while maintaining well-calibrated predictions.
arXiv Detail & Related papers (2025-11-19T20:11:49Z)
Weakly Supervised Pneumonia Localization from Chest X-Rays Using Deep Neural Network and Grad-CAM Explanations [0.0]
This study proposes a weakly supervised deep learning framework for pneumonia classification and localization from chest X-rays.<n>Instead of costly pixel-level annotations, our approach utilizes image-level labels to generate clinically meaningful heatmaps.
arXiv Detail & Related papers (2025-11-01T08:44:24Z)
LightPneumoNet: Lightweight Pneumonia Classifier [0.0]
This study introduces LightPneumoNet, an efficient, lightweight convolutional neural network (CNN) built from scratch.<n>Our model was trained on a public dataset of 5,856 chest X-ray images.<n>On an independent test set, our model delivered exceptional performance, achieving an overall accuracy of 0.942, precision of 0.92, and an F1-Score of 0.96.
arXiv Detail & Related papers (2025-10-13T10:14:17Z)
Automatic Cough Analysis for Non-Small Cell Lung Cancer Detection [33.37223681850477]
Early detection of non-small cell lung cancer (NSCLC) is critical for improving patient outcomes.<n>We explore the use of automatic cough analysis as a pre-screening tool for distinguishing between NSCLC patients and healthy controls.<n>Recordings were analyzed using machine learning techniques, such as support vector machine (SVM) and XGBoost.
arXiv Detail & Related papers (2025-07-25T11:30:22Z)
Predicting Length of Stay in Neurological ICU Patients Using Classical Machine Learning and Neural Network Models: A Benchmark Study on MIMIC-IV [49.1574468325115]
This study explores multiple ML approaches for predicting LOS in ICU specifically for the patients with neurological diseases based on the MIMIC-IV dataset.<n>The evaluated models include classic ML algorithms (K-Nearest Neighbors, Random Forest, XGBoost and CatBoost) and Neural Networks (LSTM, BERT and Temporal Fusion Transformer)
arXiv Detail & Related papers (2025-05-23T14:06:42Z)
ThyroidEffi 1.0: A Cost-Effective System for High-Performance Multi-Class Thyroid Carcinoma Classification [0.0]
We develop and validate a deep learning system for multi-class thyroid FNAB image classification.<n>Benign, Indeterminate/Suspicious, and Malignant are three key categories directly guiding post-biopsy treatment.<n>The system processed 1000 cases in 30 seconds, demonstrating feasibility on widely accessible hardware.
arXiv Detail & Related papers (2025-04-19T02:13:07Z)
Advancing Chronic Tuberculosis Diagnostics Using Vision-Language Models: A Multi modal Framework for Precision Analysis [0.0]
This study proposes a Vision-Language Model (VLM) to enhance automated chronic tuberculosis (TB) screening.<n>By integrating chest X-ray images with clinical data, the model addresses the challenges of manual interpretation.<n>The model demonstrated high precision (94 percent) and recall (94 percent) for detecting key chronic TB pathologies.
arXiv Detail & Related papers (2025-03-17T13:49:29Z)
MOZART: Ensembling Approach for COVID-19 Detection using Chest X-Ray Imagery [0.0]
COVID-19, has led to a global pandemic that strained the healthcare systems. Traditional convolutional neural networks (CNNs) achieve impressive accuracy. We introduce the MOZART framework, an ensemble learning approach that enhances the virus detection.
arXiv Detail & Related papers (2024-10-11T21:02:58Z)
Lung-CADex: Fully automatic Zero-Shot Detection and Classification of Lung Nodules in Thoracic CT Images [45.29301790646322]
Computer-aided diagnosis can help with early lung nodul detection and facilitate subsequent nodule characterization. We propose CADe, for segmenting lung nodules in a zero-shot manner using a variant of the Segment Anything Model called MedSAM. We also propose, CADx, a method for the nodule characterization as benign/malignant by making a gallery of radiomic features and aligning image-feature pairs through contrastive learning.
arXiv Detail & Related papers (2024-07-02T19:30:25Z)
Learning to diagnose cirrhosis from radiological and histological labels with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset. We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis. This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z)
Self-supervised contrastive learning of echocardiogram videos enables label-efficient cardiac disease diagnosis [48.64462717254158]
We developed a self-supervised contrastive learning approach, EchoCLR, to catered to echocardiogram videos. When fine-tuned on small portions of labeled data, EchoCLR pretraining significantly improved classification performance for left ventricular hypertrophy (LVH) and aortic stenosis (AS) EchoCLR is unique in its ability to learn representations of medical videos and demonstrates that SSL can enable label-efficient disease classification from small, labeled datasets.
arXiv Detail & Related papers (2022-07-23T19:17:26Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.