Related papers: SpurBreast: A Curated Dataset for Investigating Spurious Correlations in Real-world Breast MRI Classification

SpurBreast: A Curated Dataset for Investigating Spurious Correlations in Real-world Breast MRI Classification

URL: http://arxiv.org/abs/2510.02109v1
Date: Thu, 02 Oct 2025 15:16:20 GMT
Title: SpurBreast: A Curated Dataset for Investigating Spurious Correlations in Real-world Breast MRI Classification
Authors: Jong Bum Won, Wesley De Neve, Joris Vankerschaver, Utku Ozbulak,
Abstract summary: We introduce SpurBreast, a curated breast MRI dataset that intentionally incorporates spurious correlations to evaluate their impact on model performance.<n>We analyze over 100 features involving patient, device, and imaging protocol, and identify two dominant spurious signals: magnetic field strength and image orientation.<n>Through controlled dataset splits, we demonstrate that DNNs can exploit these non-clinical signals, achieving high validation accuracy while failing to generalize to unbiased test data.
Score: 0.4999814847776096
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks (DNNs) have demonstrated remarkable success in medical imaging, yet their real-world deployment remains challenging due to spurious correlations, where models can learn non-clinical features instead of meaningful medical patterns. Existing medical imaging datasets are not designed to systematically study this issue, largely due to restrictive licensing and limited supplementary patient data. To address this gap, we introduce SpurBreast, a curated breast MRI dataset that intentionally incorporates spurious correlations to evaluate their impact on model performance. Analyzing over 100 features involving patient, device, and imaging protocol, we identify two dominant spurious signals: magnetic field strength (a global feature influencing the entire image) and image orientation (a local feature affecting spatial alignment). Through controlled dataset splits, we demonstrate that DNNs can exploit these non-clinical signals, achieving high validation accuracy while failing to generalize to unbiased test data. Alongside these two datasets containing spurious correlations, we also provide benchmark datasets without spurious correlations, allowing researchers to systematically investigate clinically relevant and irrelevant features, uncertainty estimation, adversarial robustness, and generalization strategies. Models and datasets are available at https://github.com/utkuozbulak/spurbreast.

Related papers

Towards Trustworthy Breast Tumor Segmentation in Ultrasound using Monte Carlo Dropout and Deep Ensembles for Epistemic Uncertainty Estimation [0.01066386648660129]
We evaluate the use of a modified Residual U-Net for breast ultrasound segmentation.<n>We identify and correct for data duplication in the BUSI dataset.<n>Epistemic uncertainty is quantified using Monte Carlo dropout, deep ensembles, and their combination.
arXiv Detail & Related papers (2025-08-25T08:06:07Z)
SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms [60.35639972035727]
The lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms. The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI. Dice scores reached up to 0.838 $pm$ 0.066 and 0.716 $pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $pm$ 0.15.
arXiv Detail & Related papers (2024-11-14T17:06:00Z)
Weakly supervised deep learning model with size constraint for prostate cancer detection in multiparametric MRI and generalization to unseen domains [0.90668179713299]
We show that the model achieves on-par performance with strong fully supervised baseline models. We also observe a performance decrease for both fully supervised and weakly supervised models when tested on unseen data domains.
arXiv Detail & Related papers (2024-11-04T12:24:33Z)
Detecting Unforeseen Data Properties with Diffusion Autoencoder Embeddings using Spine MRI data [7.757515290013924]
Deep learning has made significant strides in medical imaging, leveraging the use of large datasets to improve diagnostics and prognostics. Large datasets often come with inherent errors through subject selection and acquisition. We investigate the use of Diffusion Autoencoder embeddings for uncovering and understanding data characteristics and biases.
arXiv Detail & Related papers (2024-10-14T07:24:26Z)
Autoencoder based approach for the mitigation of spurious correlations [2.7624021966289605]
Spurious correlations refer to erroneous associations in data that do not reflect true underlying relationships. These correlations can lead deep neural networks (DNNs) to learn patterns that are not robust across diverse datasets or real-world scenarios. We propose an autoencoder-based approach to analyze the nature of spurious correlations that exist in the Global Wheat Head Detection (GWHD) 2021 dataset.
arXiv Detail & Related papers (2024-06-27T05:28:44Z)
ssVERDICT: Self-Supervised VERDICT-MRI for Enhanced Prostate Tumour Characterisation [2.755232740505053]
Self-supervised neural network for fitting VERDICT estimates parameter maps without training data. We compare the performance of ssVERDICT to two established baseline methods for fitting diffusion MRI models.
arXiv Detail & Related papers (2023-09-12T14:31:33Z)
Source-Free Collaborative Domain Adaptation via Multi-Perspective Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis. Many methods have been proposed to reduce fMRI heterogeneity between source and target domains. But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies. We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z)
Interpretable Additive Recurrent Neural Networks For Multivariate Clinical Time Series [4.125698836261585]
We present the Interpretable-RNN (I-RNN) that balances model complexity and accuracy by forcing the relationship between variables in the model to be additive. I-RNN specifically captures the unique characteristics of clinical time series, which are unevenly sampled in time, asynchronously acquired, and have missing data. We evaluate the I-RNN model on the Physionet 2012 Challenge dataset to predict in-hospital mortality, and on a real-world clinical decision support task: predicting hemodynamic interventions in the intensive care unit.
arXiv Detail & Related papers (2021-09-15T22:30:19Z)
OR-Net: Pointwise Relational Inference for Data Completion under Partial Observation [51.083573770706636]
This work uses relational inference to fill in the incomplete data. We propose Omni-Relational Network (OR-Net) to model the pointwise relativity in two aspects.
arXiv Detail & Related papers (2021-05-02T06:05:54Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
Deep learning-based COVID-19 pneumonia classification using chest CT images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries. We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split. The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z)
Fader Networks for domain adaptation on fMRI: ABIDE-II study [68.5481471934606]
We use 3D convolutional autoencoders to build the domain irrelevant latent space image representation and demonstrate this method to outperform existing approaches on ABIDE data.
arXiv Detail & Related papers (2020-10-14T16:50:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.