Targeting Underrepresented Populations in Precision Medicine: A
Federated Transfer Learning Approach
- URL: http://arxiv.org/abs/2108.12112v1
- Date: Fri, 27 Aug 2021 04:04:34 GMT
- Title: Targeting Underrepresented Populations in Precision Medicine: A
Federated Transfer Learning Approach
- Authors: Sai Li, Tianxi Cai, Rui Duan
- Abstract summary: We propose a two-way data integration strategy that integrates heterogeneous data from diverse populations and from multiple healthcare institutions.
We show that the proposed method improves the estimation and prediction accuracy in underrepresented populations, and reduces the gap of model performance across populations.
- Score: 7.467496975496821
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The limited representation of minorities and disadvantaged populations in
large-scale clinical and genomics research has become a barrier to translating
precision medicine research into practice. Due to heterogeneity across
populations, risk prediction models are often found to be underperformed in
these underrepresented populations, and therefore may further exacerbate known
health disparities. In this paper, we propose a two-way data integration
strategy that integrates heterogeneous data from diverse populations and from
multiple healthcare institutions via a federated transfer learning approach.
The proposed method can handle the challenging setting where sample sizes from
different populations are highly unbalanced. With only a small number of
communications across participating sites, the proposed method can achieve
performance comparable to the pooled analysis where individual-level data are
directly pooled together. We show that the proposed method improves the
estimation and prediction accuracy in underrepresented populations, and reduces
the gap of model performance across populations. Our theoretical analysis
reveals how estimation accuracy is influenced by communication budgets, privacy
restrictions, and heterogeneity across populations. We demonstrate the
feasibility and validity of our methods through numerical experiments and a
real application to a multi-center study, in which we construct polygenic risk
prediction models for Type II diabetes in AA population.
Related papers
- Targeted Data Fusion for Causal Survival Analysis Under Distribution Shift [46.84912148188679]
Causal inference has the potential to improve the generalizability, transportability, and replicability of scientific findings.
Existing data fusion methods focus on binary or continuous outcomes.
We propose two novel approaches for multi-source causal survival analysis.
arXiv Detail & Related papers (2025-01-30T23:21:25Z) - U-aggregation: Unsupervised Aggregation of Multiple Learning Algorithms [4.871473117968554]
We propose an unsupervised model aggregation method, U-aggregation, for enhanced and robust performance in new populations.
Unlike existing supervised model aggregation or super learner approaches, U-aggregation assumes no observed labels or outcomes in the target population.
We demonstrate its potential real-world application by using U-aggregation to enhance genetic risk prediction of complex traits.
arXiv Detail & Related papers (2025-01-30T01:42:51Z) - Sample Selection Bias in Machine Learning for Healthcare [17.549969100454803]
We focus on sample selection bias ( SSB), a specific type of bias where the study population is less representative of the target population.
Existing machine learning techniques try to correct the bias mostly by balancing distributions between the study and the target populations.
We propose a new research direction for addressing SSB, based on the target population identification rather than the bias correction.
arXiv Detail & Related papers (2024-05-13T15:30:35Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Multiply Robust Federated Estimation of Targeted Average Treatment
Effects [0.0]
We propose a novel approach to derive valid causal inferences for a target population using multi-site data.
Our methodology incorporates transfer learning to estimate ensemble weights to combine information from source sites.
arXiv Detail & Related papers (2023-09-22T03:15:08Z) - Multi-dimensional domain generalization with low-rank structures [18.565189720128856]
In statistical and machine learning methods, it is typically assumed that the test data are identically distributed with the training data.
This assumption does not always hold, especially in applications where the target population are not well-represented in the training data.
We present a novel approach to addressing this challenge in linear regression models.
arXiv Detail & Related papers (2023-09-18T08:07:58Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z) - Survival Cluster Analysis [93.50540270973927]
There is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles.
An approach that addresses this need is likely to improve characterization of individual outcomes.
arXiv Detail & Related papers (2020-02-29T22:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.