Related papers: Evaluating Self-Supervised Learning in Medical Imaging: A Benchmark for Robustness, Generalizability, and Multi-Domain Impact

Evaluating Self-Supervised Learning in Medical Imaging: A Benchmark for Robustness, Generalizability, and Multi-Domain Impact

URL: http://arxiv.org/abs/2412.19124v1
Date: Thu, 26 Dec 2024 08:51:56 GMT
Title: Evaluating Self-Supervised Learning in Medical Imaging: A Benchmark for Robustness, Generalizability, and Multi-Domain Impact
Authors: Valay Bundele, Oğuz Ata Çal, Bora Kargi, Karahan Sarıtaş, Kıvanç Tezören, Zohreh Ghaderi, Hendrik Lensch,
Abstract summary: We present a comprehensive evaluation of self-supervised learning (SSL) methods within the medical domain.<n>Using the MedMNIST dataset collection as a standardized benchmark, we evaluate 8 major SSL methods across 11 different medical datasets.
Score: 0.3141085922386211
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-supervised learning (SSL) has emerged as a promising paradigm in medical imaging, addressing the chronic challenge of limited labeled data in healthcare settings. While SSL has shown impressive results, existing studies in the medical domain are often limited in scope, focusing on specific datasets or modalities, or evaluating only isolated aspects of model performance. This fragmented evaluation approach poses a significant challenge, as models deployed in critical medical settings must not only achieve high accuracy but also demonstrate robust performance and generalizability across diverse datasets and varying conditions. To address this gap, we present a comprehensive evaluation of SSL methods within the medical domain, with a particular focus on robustness and generalizability. Using the MedMNIST dataset collection as a standardized benchmark, we evaluate 8 major SSL methods across 11 different medical datasets. Our study provides an in-depth analysis of model performance in both in-domain scenarios and the detection of out-of-distribution (OOD) samples, while exploring the effect of various initialization strategies, model architectures, and multi-domain pre-training. We further assess the generalizability of SSL methods through cross-dataset evaluations and the in-domain performance with varying label proportions (1%, 10%, and 100%) to simulate real-world scenarios with limited supervision. We hope this comprehensive benchmark helps practitioners and researchers make more informed decisions when applying SSL methods to medical applications.

Related papers

EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis [62.00431604976949]
EndoBench is the first comprehensive benchmark specifically designed to assess MLLMs across the full spectrum of endoscopic practice.<n>We benchmark 23 state-of-the-art models, including general-purpose, medical-specialized, and proprietary MLLMs.<n>Our experiments reveal: proprietary MLLMs outperform open-source and medical-specialized models overall, but still trail human experts.
arXiv Detail & Related papers (2025-05-29T16:14:34Z)
Weakly supervised deep learning model with size constraint for prostate cancer detection in multiparametric MRI and generalization to unseen domains [0.90668179713299]
We show that the model achieves on-par performance with strong fully supervised baseline models. We also observe a performance decrease for both fully supervised and weakly supervised models when tested on unseen data domains.
arXiv Detail & Related papers (2024-11-04T12:24:33Z)
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels. We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z)
Developing Healthcare Language Model Embedding Spaces [0.20971479389679337]
Pre-trained Large Language Models (LLMs) often struggle on out-of-domain datasets like healthcare focused text. Three methods are assessed: traditional masked language modeling, Deep Contrastive Learning for Unsupervised Textual Representations (DeCLUTR) and a novel pre-training objective utilizing metadata categories from the healthcare settings. Contrastively trained models outperform other approaches on the classification tasks, delivering strong performance from limited labeled data and with fewer model parameter updates required.
arXiv Detail & Related papers (2024-03-28T19:31:32Z)
Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites: A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area. We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions. We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z)
Benchmarking Self-Supervised Learning on Diverse Pathology Datasets [10.868779327544688]
Self-supervised learning has shown to be an effective method for utilizing unlabeled data. We execute the largest-scale study of SSL pre-training on pathology image data. For the first time, we apply SSL to the challenging task of nuclei instance segmentation.
arXiv Detail & Related papers (2022-12-09T06:38:34Z)
Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community. The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
An Empirical Framework for Domain Generalization in Clinical Settings [14.363133217553715]
We benchmark the performance of eight domain generalization methods on multi-site clinical time series and medical imaging data. We find that current domain generalization methods do not achieve significant gains in out-of-distribution performance over empirical risk minimization.
arXiv Detail & Related papers (2021-03-20T11:48:57Z)
Adversarial Sample Enhanced Domain Adaptation: A Case Study on Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation. adversarially generated samples are used during domain adaptation. Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z)
Semi-supervised Medical Image Classification with Relation-driven Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification. It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations. Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
Multi-site fMRI Analysis Using Privacy-preserving Federated Learning and Domain Adaptation: ABIDE Results [13.615292855384729]
To train a high-quality deep learning model, the aggregation of a significant amount of patient information is required. Due to the need to protect the privacy of patient data, it is hard to assemble a central database from multiple institutions. Federated learning allows for population-level models to be trained without centralizing entities' data.
arXiv Detail & Related papers (2020-01-16T04:49:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.