Related papers: Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography

Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography

URL: http://arxiv.org/abs/2506.13964v1
Date: Mon, 16 Jun 2025 20:14:37 GMT
Title: Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography
Authors: Yusdivia Molina-Román, David Gómez-Ortiz, Ernestina Menasalvas-Ruiz, José Gerardo Tamez-Peña, Alejandro Santos-Díaz,
Abstract summary: This study compares multimodal and CNN-based methods for automated classification using the BI-RADS system.<n>Zero-shot classification achieved modest performance, while the fine-tuned ConvNeXt model outperformed the BioMedCLIP linear probe.<n>These findings suggest that despite the promise of multimodal learning, CNN-based models with end-to-end fine-tuning provide stronger performance for specialized medical imaging.
Score: 39.58317527488534
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Mammographic breast density classification is essential for cancer risk assessment but remains challenging due to subjective interpretation and inter-observer variability. This study compares multimodal and CNN-based methods for automated classification using the BI-RADS system, evaluating BioMedCLIP and ConvNeXt across three learning scenarios: zero-shot classification, linear probing with textual descriptions, and fine-tuning with numerical labels. Results show that zero-shot classification achieved modest performance, while the fine-tuned ConvNeXt model outperformed the BioMedCLIP linear probe. Although linear probing demonstrated potential with pretrained embeddings, it was less effective than full fine-tuning. These findings suggest that despite the promise of multimodal learning, CNN-based models with end-to-end fine-tuning provide stronger performance for specialized medical imaging. The study underscores the need for more detailed textual representations and domain-specific adaptations in future radiology applications.

Related papers

Interpreting Biomedical VLMs on High-Imbalance Out-of-Distributions: An Insight into BiomedCLIP on Radiology [0.0]
We analyse the limitations of BiomedCLIP when applied to a highly imbalanced, out-of-distribution medical dataset.<n>We show that the model under zero-shot settings over-predicts all labels, leading to poor precision and inter-class separability.<n>We highlight the need for careful adaptations of the models to foster reliability and applicability in a real-world setting.
arXiv Detail & Related papers (2025-06-17T02:59:42Z)
Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis [16.268045905735818]
We propose CMSwinKAN, a contrastive-learning-based multi-scale feature fusion model tailored for pathological image classification.<n>By fusing multi-scale features and leveraging contrastive learning strategies, CMSwinKAN mimics clinicians' comprehensive approach.<n>Results demonstrate that CMSwinKAN performs better than existing state-of-the-art pathology-specific models pre-trained on large datasets.
arXiv Detail & Related papers (2025-04-18T15:39:46Z)
Interpretable Retinal Disease Prediction Using Biology-Informed Heterogeneous Graph Representations [40.8160960729546]
Interpretability is crucial to enhance trust in machine learning models for medical diagnostics.<n>This work proposes a method that surpasses the performance of established machine learning models.
arXiv Detail & Related papers (2025-02-23T19:27:47Z)
An analysis of data variation and bias in image-based dermatological datasets for machine learning classification [2.039829968340841]
In clinical dermatology, classification models can detect malignant lesions on patients' skin using only RGB images as input.<n>Most learning-based methods employ data acquired from dermoscopic datasets on training, which are large and validated by a gold standard.<n>This work aims to evaluate the gap between dermoscopic and clinical samples and understand how the dataset variations impact training.
arXiv Detail & Related papers (2025-01-15T17:18:46Z)
Graph-Ensemble Learning Model for Multi-label Skin Lesion Classification using Dermoscopy and Clinical Images [7.159532626507458]
This study introduces a Graph Convolution Network (GCN) to exploit prior co-occurrence between each category as a correlation matrix into the deep learning model for the multi-label classification. We propose a Graph-Ensemble Learning Model (GELN) that views the prediction from GCN as complementary information of the predictions from the fusion model.
arXiv Detail & Related papers (2023-07-04T13:19:57Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space. We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains. Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z)
Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders [50.689585476660554]
We propose a new fine-tuning strategy that includes positive-pair loss relaxation and random sentence sampling. Our approach consistently improves overall zero-shot pathology classification across four chest X-ray datasets and three pre-trained models.
arXiv Detail & Related papers (2022-12-14T06:04:18Z)
Application of Transfer Learning and Ensemble Learning in Image-level Classification for Breast Histopathology [9.037868656840736]
In Computer-Aided Diagnosis (CAD), traditional classification models mostly use a single network to extract features. This paper proposes a deep ensemble model based on image-level labels for the binary classification of benign and malignant lesions. Result: In the ensemble network model with accuracy as the weight, the image-level binary classification achieves an accuracy of $98.90%$.
arXiv Detail & Related papers (2022-04-18T13:31:53Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community. We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence. We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
Semi-supervised Medical Image Classification with Relation-driven Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification. It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations. Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.