Related papers: Privacy Preserving Federated Unsupervised Domain Adaptation with Application to Age Prediction from DNA Methylation Data

Privacy Preserving Federated Unsupervised Domain Adaptation with Application to Age Prediction from DNA Methylation Data

URL: http://arxiv.org/abs/2411.17287v1
Date: Tue, 26 Nov 2024 10:19:16 GMT
Title: Privacy Preserving Federated Unsupervised Domain Adaptation with Application to Age Prediction from DNA Methylation Data
Authors: Cem Ata Baykara, Ali Burak Ünal, Nico Pfeifer, Mete Akgün,
Abstract summary: We introduce a privacy-preserving framework for unsupervised domain adaptation in high-dimensional settings. Our framework is the first privacy-preserving solution for high-dimensional domain adaptation in federated environments.
Score: 2.699900017799093
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In computational biology, predictive models are widely used to address complex tasks, but their performance can suffer greatly when applied to data from different distributions. The current state-of-the-art domain adaptation method for high-dimensional data aims to mitigate these issues by aligning the input dependencies between training and test data. However, this approach requires centralized access to both source and target domain data, raising concerns about data privacy, especially when the data comes from multiple sources. In this paper, we introduce a privacy-preserving federated framework for unsupervised domain adaptation in high-dimensional settings. Our method employs federated training of Gaussian processes and weighted elastic nets to effectively address the problem of distribution shift between domains, while utilizing secure aggregation and randomized encoding to protect the local data of participating data owners. We evaluate our framework on the task of age prediction using DNA methylation data from multiple tissues, demonstrating that our approach performs comparably to existing centralized methods while maintaining data privacy, even in distributed environments where data is spread across multiple institutions. Our framework is the first privacy-preserving solution for high-dimensional domain adaptation in federated environments, offering a promising tool for fields like computational biology and medicine, where protecting sensitive data is essential.

Related papers

Towards Open-world Generalized Deepfake Detection: General Feature Extraction via Unsupervised Domain Adaptation [15.737902253508235]
Social platforms are flooded with vast amounts of unlabeled synthetic data and authentic data.<n>In open world scenarios, the amount of unlabeled data greatly exceeds that of labeled data.<n>We propose a novel Open-World Deepfake Detection Generalization Enhancement Training Strategy (OWG-DS) to improve the generalization ability of existing methods.
arXiv Detail & Related papers (2025-05-18T10:12:12Z)
A Dataset for Semantic Segmentation in the Presence of Unknowns [49.795683850385956]
Existing datasets allow evaluation of only knowns or unknowns - but not both. We propose a novel anomaly segmentation dataset, ISSU, that features a diverse set of anomaly inputs from cluttered real-world environments. The dataset is twice larger than existing anomaly segmentation datasets.
arXiv Detail & Related papers (2025-03-28T10:31:01Z)
Privacy-preserving datasets by capturing feature distributions with Conditional VAEs [0.11999555634662634]
Conditional Variational Autoencoders (CVAEs) trained on feature vectors extracted from large pre-trained vision foundation models. Our method notably outperforms traditional approaches in both medical and natural image domains. Results underscore the potential of generative models to significantly impact deep learning applications in data-scarce and privacy-sensitive environments.
arXiv Detail & Related papers (2024-08-01T15:26:24Z)
PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a. Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns. We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z)
DiffClass: Diffusion-Based Class Incremental Learning [30.514281721324853]
Class Incremental Learning (CIL) is challenging due to catastrophic forgetting. Recent exemplar-free CIL methods attempt to mitigate catastrophic forgetting by synthesizing previous task data. We propose a novel exemplar-free CIL method to overcome these issues.
arXiv Detail & Related papers (2024-03-08T03:34:18Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
Trust your Good Friends: Source-free Domain Adaptation by Reciprocal Neighborhood Clustering [50.46892302138662]
We address the source-free domain adaptation problem, where the source pretrained model is adapted to the target domain in the absence of source data. Our method is based on the observation that target data, which might not align with the source domain classifier, still forms clear clusters. We demonstrate that this local structure can be efficiently captured by considering the local neighbors, the reciprocal neighbors, and the expanded neighborhood.
arXiv Detail & Related papers (2023-09-01T15:31:18Z)
Fed-MIWAE: Federated Imputation of Incomplete Data via Deep Generative Models [5.373862368597948]
Federated learning allows for the training of machine learning models on multiple local datasets without requiring explicit data exchange. Data pre-processing, including strategies for handling missing data, remains a major bottleneck in real-world federated learning deployment. We propose Fed-MIWAE, a deep latent variable model for missing data imputation based on variational autoencoders.
arXiv Detail & Related papers (2023-04-17T08:14:08Z)
One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data. Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation. We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Deep Unsupervised Domain Adaptation: A Review of Recent Advances and Perspectives [16.68091981866261]
Unsupervised domain adaptation (UDA) is proposed to counter the performance drop on data in a target domain. UDA has yielded promising results on natural image processing, video analysis, natural language processing, time-series data analysis, medical image analysis, etc.
arXiv Detail & Related papers (2022-08-15T20:05:07Z)
Mitigating Data Heterogeneity in Federated Learning with Data Augmentation [26.226057709504733]
Federated Learning (FL) is a framework that enables training a centralized model while securing user privacy by fusing local, decentralized models. One major obstacle is data heterogeneity, i.e., each client having non-identically and independently distributed (non-IID) data. Recent evidence suggests that data augmentation can induce equal or greater performance.
arXiv Detail & Related papers (2022-06-20T19:47:43Z)
Source-Free Domain Adaptation via Distribution Estimation [106.48277721860036]
Domain Adaptation aims to transfer the knowledge learned from a labeled source domain to an unlabeled target domain whose data distributions are different. Recently, Source-Free Domain Adaptation (SFDA) has drawn much attention, which tries to tackle domain adaptation problem without using source data. In this work, we propose a novel framework called SFDA-DE to address SFDA task via source Distribution Estimation.
arXiv Detail & Related papers (2022-04-24T12:22:19Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
Secure Domain Adaptation with Multiple Sources [13.693640425403636]
Multi-source unsupervised domain adaptation (MUDA) is a recently explored learning framework. The goal is to address the challenge of labeled data scarcity in a target domain via transferring knowledge from multiple source domains with annotated data. Since the source data is distributed, the privacy of source domains' data can be a natural concern. We benefit from the idea of domain alignment in an embedding space to address the privacy concern for MUDA.
arXiv Detail & Related papers (2021-06-23T02:26:36Z)
TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation [73.75784418508033]
In few-shot domain adaptation (FDA), classifiers for the target domain are trained with labeled data in the source domain (SD) and few labeled data in the target domain (TD) Data usually contain private information in the current era, e.g., data distributed on personal phones. We propose a target orientated hypothesis adaptation network (TOHAN) to solve the problem.
arXiv Detail & Related papers (2021-06-11T11:46:20Z)
Data-driven Regularized Inference Privacy [33.71757542373714]
We propose a data-driven inference privacy preserving framework to sanitize data. We develop an inference privacy framework based on the variational method. We present empirical methods to estimate the privacy metric.
arXiv Detail & Related papers (2020-10-10T08:42:59Z)
Open-Set Hypothesis Transfer with Semantic Consistency [99.83813484934177]
We introduce a method that focuses on the semantic consistency under transformation of target data. Our model first discovers confident predictions and performs classification with pseudo-labels. As a result, unlabeled data can be classified into discriminative classes coincided with either source classes or unknown classes.
arXiv Detail & Related papers (2020-10-01T10:44:31Z)
Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal Clustering and Large-Scale Heterogeneous Environment Synthesis [76.46004354572956]
We introduce an unsupervised domain adaptation approach for person re-identification. Experimental results show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance.
arXiv Detail & Related papers (2020-01-14T17:43:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.