Targeting Underrepresented Populations in Precision Medicine: A
Federated Transfer Learning Approach
- URL: http://arxiv.org/abs/2108.12112v1
- Date: Fri, 27 Aug 2021 04:04:34 GMT
- Title: Targeting Underrepresented Populations in Precision Medicine: A
Federated Transfer Learning Approach
- Authors: Sai Li, Tianxi Cai, Rui Duan
- Abstract summary: We propose a two-way data integration strategy that integrates heterogeneous data from diverse populations and from multiple healthcare institutions.
We show that the proposed method improves the estimation and prediction accuracy in underrepresented populations, and reduces the gap of model performance across populations.
- Score: 7.467496975496821
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The limited representation of minorities and disadvantaged populations in
large-scale clinical and genomics research has become a barrier to translating
precision medicine research into practice. Due to heterogeneity across
populations, risk prediction models are often found to be underperformed in
these underrepresented populations, and therefore may further exacerbate known
health disparities. In this paper, we propose a two-way data integration
strategy that integrates heterogeneous data from diverse populations and from
multiple healthcare institutions via a federated transfer learning approach.
The proposed method can handle the challenging setting where sample sizes from
different populations are highly unbalanced. With only a small number of
communications across participating sites, the proposed method can achieve
performance comparable to the pooled analysis where individual-level data are
directly pooled together. We show that the proposed method improves the
estimation and prediction accuracy in underrepresented populations, and reduces
the gap of model performance across populations. Our theoretical analysis
reveals how estimation accuracy is influenced by communication budgets, privacy
restrictions, and heterogeneity across populations. We demonstrate the
feasibility and validity of our methods through numerical experiments and a
real application to a multi-center study, in which we construct polygenic risk
prediction models for Type II diabetes in AA population.
Related papers
- Sample Selection Bias in Machine Learning for Healthcare [17.549969100454803]
sample selection bias ( SSB) refers to the study population being less representative of the target population, leading to biased and potentially harmful decisions.
Despite being well-known in the literature, SSB remains scarcely studied in machine learning for healthcare.
We propose a new research direction for addressing SSB, based on the target population identification rather than the bias correction.
arXiv Detail & Related papers (2024-05-13T15:30:35Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Multiply Robust Federated Estimation of Targeted Average Treatment
Effects [0.0]
We propose a novel approach to derive valid causal inferences for a target population using multi-site data.
Our methodology incorporates transfer learning to estimate ensemble weights to combine information from source sites.
arXiv Detail & Related papers (2023-09-22T03:15:08Z) - Multi-dimensional domain generalization with low-rank structures [18.565189720128856]
In statistical and machine learning methods, it is typically assumed that the test data are identically distributed with the training data.
This assumption does not always hold, especially in applications where the target population are not well-represented in the training data.
We present a novel approach to addressing this challenge in linear regression models.
arXiv Detail & Related papers (2023-09-18T08:07:58Z) - Improving genetic risk prediction across diverse population by
disentangling ancestry representations [10.803542340843368]
We propose a novel deep-learning framework that disentangles ancestry from the phenotype-relevant information in its representation.
The ancestry disentangled representation can be used to build risk predictors that perform better across minority populations.
arXiv Detail & Related papers (2022-05-10T05:05:37Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Variational Disentanglement for Rare Event Modeling [21.269897066024306]
We propose a variational disentanglement approach to learn from rare events in heavily imbalanced classification problems.
Specifically, we leverage the imposed extreme-distribution behavior on a latent space to extract information from low-prevalence events.
arXiv Detail & Related papers (2020-09-17T21:35:36Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z) - Survival Cluster Analysis [93.50540270973927]
There is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles.
An approach that addresses this need is likely to improve characterization of individual outcomes.
arXiv Detail & Related papers (2020-02-29T22:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.