Related papers: Targeting Underrepresented Populations in Precision Medicine: A Federated Transfer Learning Approach

Targeting Underrepresented Populations in Precision Medicine: A Federated Transfer Learning Approach

URL: http://arxiv.org/abs/2108.12112v1
Date: Fri, 27 Aug 2021 04:04:34 GMT
Title: Targeting Underrepresented Populations in Precision Medicine: A Federated Transfer Learning Approach
Authors: Sai Li, Tianxi Cai, Rui Duan
Abstract summary: We propose a two-way data integration strategy that integrates heterogeneous data from diverse populations and from multiple healthcare institutions. We show that the proposed method improves the estimation and prediction accuracy in underrepresented populations, and reduces the gap of model performance across populations.
Score: 7.467496975496821
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The limited representation of minorities and disadvantaged populations in large-scale clinical and genomics research has become a barrier to translating precision medicine research into practice. Due to heterogeneity across populations, risk prediction models are often found to be underperformed in these underrepresented populations, and therefore may further exacerbate known health disparities. In this paper, we propose a two-way data integration strategy that integrates heterogeneous data from diverse populations and from multiple healthcare institutions via a federated transfer learning approach. The proposed method can handle the challenging setting where sample sizes from different populations are highly unbalanced. With only a small number of communications across participating sites, the proposed method can achieve performance comparable to the pooled analysis where individual-level data are directly pooled together. We show that the proposed method improves the estimation and prediction accuracy in underrepresented populations, and reduces the gap of model performance across populations. Our theoretical analysis reveals how estimation accuracy is influenced by communication budgets, privacy restrictions, and heterogeneity across populations. We demonstrate the feasibility and validity of our methods through numerical experiments and a real application to a multi-center study, in which we construct polygenic risk prediction models for Type II diabetes in AA population.

Related papers

Targeted Data Fusion for Causal Survival Analysis Under Distribution Shift [46.84912148188679]
Causal inference has the potential to improve the generalizability, transportability, and replicability of scientific findings. Existing data fusion methods focus on binary or continuous outcomes. We propose two novel approaches for multi-source causal survival analysis.
arXiv Detail & Related papers (2025-01-30T23:21:25Z)
U-aggregation: Unsupervised Aggregation of Multiple Learning Algorithms [4.871473117968554]
We propose an unsupervised model aggregation method, U-aggregation, for enhanced and robust performance in new populations. Unlike existing supervised model aggregation or super learner approaches, U-aggregation assumes no observed labels or outcomes in the target population. We demonstrate its potential real-world application by using U-aggregation to enhance genetic risk prediction of complex traits.
arXiv Detail & Related papers (2025-01-30T01:42:51Z)
Sample Selection Bias in Machine Learning for Healthcare [17.549969100454803]
sample selection bias ( SSB) refers to the study population being less representative of the target population, leading to biased and potentially harmful decisions. Despite being well-known in the literature, SSB remains scarcely studied in machine learning for healthcare. We propose a new research direction for addressing SSB, based on the target population identification rather than the bias correction.
arXiv Detail & Related papers (2024-05-13T15:30:35Z)
Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z)
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues. We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space. A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z)
Multiply Robust Federated Estimation of Targeted Average Treatment Effects [0.0]
We propose a novel approach to derive valid causal inferences for a target population using multi-site data. Our methodology incorporates transfer learning to estimate ensemble weights to combine information from source sites.
arXiv Detail & Related papers (2023-09-22T03:15:08Z)
Multi-dimensional domain generalization with low-rank structures [18.565189720128856]
In statistical and machine learning methods, it is typically assumed that the test data are identically distributed with the training data. This assumption does not always hold, especially in applications where the target population are not well-represented in the training data. We present a novel approach to addressing this challenge in linear regression models.
arXiv Detail & Related papers (2023-09-18T08:07:58Z)
Improving genetic risk prediction across diverse population by disentangling ancestry representations [10.803542340843368]
We propose a novel deep-learning framework that disentangles ancestry from the phenotype-relevant information in its representation. The ancestry disentangled representation can be used to build risk predictors that perform better across minority populations.
arXiv Detail & Related papers (2022-05-10T05:05:37Z)
Adversarial Sample Enhanced Domain Adaptation: A Case Study on Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation. adversarially generated samples are used during domain adaptation. Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
Variational Disentanglement for Rare Event Modeling [21.269897066024306]
We propose a variational disentanglement approach to learn from rare events in heavily imbalanced classification problems. Specifically, we leverage the imposed extreme-distribution behavior on a latent space to extract information from low-prevalence events.
arXiv Detail & Related papers (2020-09-17T21:35:36Z)
Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units. The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)
Survival Cluster Analysis [93.50540270973927]
There is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles. An approach that addresses this need is likely to improve characterization of individual outcomes.
arXiv Detail & Related papers (2020-02-29T22:41:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.