The role of self-supervised pretraining in differentially private medical image analysis
- URL: http://arxiv.org/abs/2601.19618v1
- Date: Tue, 27 Jan 2026 13:50:43 GMT
- Title: The role of self-supervised pretraining in differentially private medical image analysis
- Authors: Soroosh Tayebi Arasteh, Mina Farajiamiri, Mahshad Lotfinia, Behrus Hinrichs-Puladi, Jonas Bienzeisler, Mohamed Alhaskir, Mirabela Rusu, Christiane Kuhl, Sven Nebelung, Daniel Truhn,
- Abstract summary: We present a large-scale evaluation of strategies for differentially private medical image analysis.<n>We compare non-domain-specific supervised ImageNet, non-domain-specific self-supervised DINOv3 and domain-specific supervised pretraining.<n>Results establish a central robustness of utility, fairness, and generalization in differentially private medical imaging.
- Score: 1.43516305354745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differential privacy (DP) provides formal protection for sensitive data but typically incurs substantial losses in diagnostic performance. Model initialization has emerged as a critical factor in mitigating this degradation, yet the role of modern self-supervised learning under full-model DP remains poorly understood. Here, we present a large-scale evaluation of initialization strategies for differentially private medical image analysis, using chest radiograph classification as a representative benchmark with more than 800,000 images. Using state-of-the-art ConvNeXt models trained with DP-SGD across realistic privacy regimes, we compare non-domain-specific supervised ImageNet initialization, non-domain-specific self-supervised DINOv3 initialization, and domain-specific supervised pretraining on MIMIC-CXR, the largest publicly available chest radiograph dataset. Evaluations are conducted across five external datasets spanning diverse institutions and acquisition settings. We show that DINOv3 initialization consistently improves diagnostic utility relative to ImageNet initialization under DP, but remains inferior to domain-specific supervised pretraining, which achieves performance closest to non-private baselines. We further demonstrate that initialization choice strongly influences demographic fairness, cross-dataset generalization, and robustness to data scale and model capacity under privacy constraints. The results establish initialization strategy as a central determinant of utility, fairness, and generalization in differentially private medical imaging.
Related papers
- On the MIA Vulnerability Gap Between Private GANs and Diffusion Models [51.53790101362898]
Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis.<n>We present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models.
arXiv Detail & Related papers (2025-09-03T14:18:22Z) - Cross-Domain Distribution Alignment for Segmentation of Private Unannotated 3D Medical Images [20.206972068340843]
We introduce a new source-free Unsupervised Domain Adaptation (UDA) method to address this problem.
Our idea is based on estimating the internally learned distribution of a relevant source domain by a base model.
We demonstrate that our approach leads to SOTA performance on a real-world 3D medical dataset.
arXiv Detail & Related papers (2024-10-11T19:28:10Z) - Probabilistic 3D Correspondence Prediction from Sparse Unsegmented Images [1.2179682412409507]
We propose SPI-CorrNet, a unified model that predicts 3D correspondences from sparse imaging data.
Experiments on the LGE MRI left atrium dataset and Abdomen CT-1K liver datasets demonstrate that our technique enhances the accuracy and robustness of sparse image-driven SSM.
arXiv Detail & Related papers (2024-07-02T03:56:20Z) - Initialization Matters: Privacy-Utility Analysis of Overparameterized
Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets.
We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Domain Generalization with Adversarial Intensity Attack for Medical
Image Segmentation [27.49427483473792]
In real-world scenarios, it is common for models to encounter data from new and different domains to which they were not exposed to during training.
domain generalization (DG) is a promising direction as it enables models to handle data from previously unseen domains.
We introduce a novel DG method called Adversarial Intensity Attack (AdverIN), which leverages adversarial training to generate training data with an infinite number of styles.
arXiv Detail & Related papers (2023-04-05T19:40:51Z) - Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging [47.99192239793597]
We evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training.
Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.
arXiv Detail & Related papers (2023-02-03T09:49:13Z) - Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers [79.60022233109397]
We present spatial prior attention (SPAN), a framework that takes advantage of consistent spatial and semantic structure in unlabeled image datasets.
SPAN operates by regularizing attention masks from separate transformer heads to follow various priors over semantic regions.
We find that the resulting attention masks are more interpretable than those derived from domain-agnostic pretraining.
arXiv Detail & Related papers (2022-09-07T02:30:36Z) - In-Bed Human Pose Estimation from Unseen and Privacy-Preserving Image
Domains [22.92165116962952]
In-bed human posture estimation provides important health-related metrics with potential value in medical condition assessments.
We propose a multi-modal conditional variational autoencoder (MC-VAE) capable of reconstructing features from missing modalities seen during training.
We demonstrate that body positions can be effectively recognized from the available modality, achieving on par results with baseline models.
arXiv Detail & Related papers (2021-11-30T04:56:16Z) - Differentially private federated deep learning for multi-site medical
image segmentation [56.30543374146002]
Collaborative machine learning techniques such as federated learning (FL) enable the training of models on effectively larger datasets without data transfer.
Recent initiatives have demonstrated that segmentation models trained with FL can achieve performance similar to locally trained models.
However, FL is not a fully privacy-preserving technique and privacy-centred attacks can disclose confidential patient data.
arXiv Detail & Related papers (2021-07-06T12:57:32Z) - Self-Adaptive Transfer Learning for Multicenter Glaucoma Classification
in Fundus Retina Images [9.826586293806837]
We propose a self-adaptive transfer learning (SATL) strategy to fill the domain gap between multicenter datasets.
Specifically, the encoder of a DL model that is pre-trained on the source domain is used to initialize the encoder of a reconstruction model.
Results demonstrate that the proposed SATL strategy is effective in the domain adaptation task between a private and two public glaucoma diagnosis datasets.
arXiv Detail & Related papers (2021-05-07T05:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.