LEARNER: A Transfer Learning Method for Low-Rank Matrix Estimation
- URL: http://arxiv.org/abs/2412.20605v1
- Date: Sun, 29 Dec 2024 22:26:16 GMT
- Title: LEARNER: A Transfer Learning Method for Low-Rank Matrix Estimation
- Authors: Sean McGrath, Cenhao Zhu, Min Guo, Rui Duan,
- Abstract summary: We propose an approach that leverages similarity in the latent row and column spaces between the source and target populations to improve estimation in the target population.
LEARNER is based on performing a low-rank approximation of the target population data which penalizes differences between the latent row and column spaces between the source and target populations.
- Score: 5.117628843022369
- License:
- Abstract: Low-rank matrix estimation is a fundamental problem in statistics and machine learning. In the context of heterogeneous data generated from diverse sources, a key challenge lies in leveraging data from a source population to enhance the estimation of a low-rank matrix in a target population of interest. One such example is estimating associations between genetic variants and diseases in non-European ancestry groups. We propose an approach that leverages similarity in the latent row and column spaces between the source and target populations to improve estimation in the target population, which we refer to as LatEnt spAce-based tRaNsfer lEaRning (LEARNER). LEARNER is based on performing a low-rank approximation of the target population data which penalizes differences between the latent row and column spaces between the source and target populations. We present a cross-validation approach that allows the method to adapt to the degree of heterogeneity across populations. We conducted extensive simulations which found that LEARNER often outperforms the benchmark approach that only uses the target population data, especially as the signal-to-noise ratio in the source population increases. We also performed an illustrative application and empirical comparison of LEARNER and benchmark approaches in a re-analysis of a genome-wide association study in the BioBank Japan cohort. LEARNER is implemented in the R package learner.
Related papers
- Transfer Learning of CATE with Kernel Ridge Regression [4.588222946914528]
We propose a novel method for overlap-adaptive transfer learning of conditional average treatment effect (CATE) using kernel ridge regression (KRR)
We provide a theoretical justification for our method through sharp non-asymptotic MSE bounds, highlighting its adaptivity to both weak overlaps and the complexity of CATE function.
arXiv Detail & Related papers (2025-02-17T01:07:45Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - MaxMin-RLHF: Alignment with Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.
We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.
Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z) - Copula-based transferable models for synthetic population generation [1.370096215615823]
Population synthesis involves generating synthetic yet realistic representations of a target population of micro-agents.
Traditional methods, often reliant on target population samples, face limitations due to high costs and small sample sizes.
We propose a novel framework based on copulas to generate synthetic data for target populations where only empirical marginal distributions are known.
arXiv Detail & Related papers (2023-02-17T23:58:14Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Targeted Optimal Treatment Regime Learning Using Summary Statistics [12.767669486030352]
We consider an ITR estimation problem where the source and target populations may be heterogeneous.
We develop a weighting framework that tailors an ITR for a given target population by leveraging the available summary statistics.
Specifically, we propose a calibrated augmented inverse probability weighted estimator of the value function for the target population and estimate an optimal ITR.
arXiv Detail & Related papers (2022-01-17T06:11:31Z) - Targeting Underrepresented Populations in Precision Medicine: A
Federated Transfer Learning Approach [7.467496975496821]
We propose a two-way data integration strategy that integrates heterogeneous data from diverse populations and from multiple healthcare institutions.
We show that the proposed method improves the estimation and prediction accuracy in underrepresented populations, and reduces the gap of model performance across populations.
arXiv Detail & Related papers (2021-08-27T04:04:34Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift.
We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions.
Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z) - DCMD: Distance-based Classification Using Mixture Distributions on
Microbiome Data [10.171660468645603]
We present an innovative approach for distance-based classification using mixture distributions (DCMD)
This approach models the inherent uncertainty in sparse counts by estimating a mixture distribution for the sample data.
Results are compared against a number of existing machine learning and distance-based approaches.
arXiv Detail & Related papers (2020-03-29T23:30:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.