Federated Learning on Transcriptomic Data: Model Quality and Performance
Trade-Offs
- URL: http://arxiv.org/abs/2402.14527v1
- Date: Thu, 22 Feb 2024 13:21:26 GMT
- Title: Federated Learning on Transcriptomic Data: Model Quality and Performance
Trade-Offs
- Authors: Anika Hannemann, Jan Ewald, Leo Seeger, Erik Buchmann
- Abstract summary: Machine learning on large-scale genomic or transcriptomic data is important for many novel health applications.
Due to privacy and regulatory reasons, it is also problematic to aggregate all data at a trusted third party.
Federated learning is a promising solution because it enables decentralized, collaborative machine learning without exchanging raw data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning on large-scale genomic or transcriptomic data is important
for many novel health applications. For example, precision medicine tailors
medical treatments to patients on the basis of individual biomarkers, cellular
and molecular states, etc. However, the data required is sensitive, voluminous,
heterogeneous, and typically distributed across locations where dedicated
machine learning hardware is not available. Due to privacy and regulatory
reasons, it is also problematic to aggregate all data at a trusted third
party.Federated learning is a promising solution to this dilemma, because it
enables decentralized, collaborative machine learning without exchanging raw
data. In this paper, we perform comparative experiments with the federated
learning frameworks TensorFlow Federated and Flower. Our test case is the
training of disease prognosis and cell type classification models. We train the
models with distributed transcriptomic data, considering both data
heterogeneity and architectural heterogeneity. We measure model quality,
robustness against privacy-enhancing noise, computational performance and
resource overhead. Each of the federated learning frameworks has different
strengths. However, our experiments confirm that both frameworks can readily
build models on transcriptomic data, without transferring personal raw data to
a third party with abundant computational resources.
Related papers
- Federated Impression for Learning with Distributed Heterogeneous Data [19.50235109938016]
Federated learning (FL) provides a paradigm that can learn from distributed datasets across clients without requiring them to share data.
In FL, sub-optimal convergence is common among data from different health centers due to the variety in data collection protocols and patient demographics across centers.
We propose FedImpres which alleviates catastrophic forgetting by restoring synthetic data that represents the global information as federated impression.
arXiv Detail & Related papers (2024-09-11T15:37:52Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Federated Learning for Data and Model Heterogeneity in Medical Imaging [19.0931609571649]
Federated Learning (FL) is an evolving machine learning method in which multiple clients participate in collaborative learning without sharing their data with each other and the central server.
In real-world applications such as hospitals and industries, FL counters the challenges of data Heterogeneity and Model Heterogeneity.
We propose a method, MDH-FL (Exploiting Model and Data Heterogeneity in FL), to solve such problems.
arXiv Detail & Related papers (2023-07-31T21:08:45Z) - Application of Federated Learning in Building a Robust COVID-19 Chest
X-ray Classification Model [0.0]
Federated Learning (FL) helps AI models to generalize better without moving all the data to a central server.
We trained a deep learning model to solve a binary classification problem of predicting the presence or absence of COVID-19.
arXiv Detail & Related papers (2022-04-22T05:21:50Z) - Practical Challenges in Differentially-Private Federated Survival
Analysis of Medical Data [57.19441629270029]
In this paper, we take advantage of the inherent properties of neural networks to federate the process of training of survival analysis models.
In the realistic setting of small medical datasets and only a few data centers, this noise makes it harder for the models to converge.
We propose DPFed-post which adds a post-processing stage to the private federated learning scheme.
arXiv Detail & Related papers (2022-02-08T10:03:24Z) - Federated Learning of Molecular Properties in a Heterogeneous Setting [79.00211946597845]
We introduce federated heterogeneous molecular learning to address these challenges.
Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients.
FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.
arXiv Detail & Related papers (2021-09-15T12:49:13Z) - Handling Data Heterogeneity with Generative Replay in Collaborative
Learning for Medical Imaging [21.53220262343254]
We present a novel generative replay strategy to address the challenge of data heterogeneity in collaborative learning methods.
A primary model learns the desired task, and an auxiliary "generative replay model" either synthesizes images that closely resemble the input images or helps extract latent variables.
The generative replay strategy is flexible to use, can either be incorporated into existing collaborative learning methods to improve their capability of handling data heterogeneity across institutions, or be used as a novel and individual collaborative learning framework (termed FedReplay) to reduce communication cost.
arXiv Detail & Related papers (2021-06-24T17:39:55Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z) - Multi-site fMRI Analysis Using Privacy-preserving Federated Learning and
Domain Adaptation: ABIDE Results [13.615292855384729]
To train a high-quality deep learning model, the aggregation of a significant amount of patient information is required.
Due to the need to protect the privacy of patient data, it is hard to assemble a central database from multiple institutions.
Federated learning allows for population-level models to be trained without centralizing entities' data.
arXiv Detail & Related papers (2020-01-16T04:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.