Related papers: Heterogeneous Transfer Learning for Building High-Dimensional Generalized Linear Models with Disparate Datasets

Heterogeneous Transfer Learning for Building High-Dimensional Generalized Linear Models with Disparate Datasets

URL: http://arxiv.org/abs/2312.12786v1
Date: Wed, 20 Dec 2023 06:11:59 GMT
Title: Heterogeneous Transfer Learning for Building High-Dimensional Generalized Linear Models with Disparate Datasets
Authors: Ruzhang Zhao, Prosenjit Kundu, Arkajyoti Saha, Nilanjan Chatterjee
Abstract summary: We describe a transfer learning approach for building high-dimensional generalized linear models. We show that the use of adaptive-Lasso penalty leads to the oracle property of underlying parameter estimates. We illustrate a timely application of the proposed method for the development of risk prediction models for five common diseases.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Development of comprehensive prediction models are often of great interest in many disciplines of science, but datasets with information on all desired features typically have small sample sizes. In this article, we describe a transfer learning approach for building high-dimensional generalized linear models using data from a main study that has detailed information on all predictors, and from one or more external studies that have ascertained a more limited set of predictors. We propose using the external dataset(s) to build reduced model(s) and then transfer the information on underlying parameters for the analysis of the main study through a set of calibration equations, while accounting for the study-specific effects of certain design variables. We then use a generalized method of moment (GMM) with penalization for parameter estimation and develop highly scalable algorithms for fitting models taking advantage of the popular glmnet package. We further show that the use of adaptive-Lasso penalty leads to the oracle property of underlying parameter estimates and thus leads to convenient post-selection inference procedures. We conduct extensive simulation studies to investigate both predictive performance and post-selection inference properties of the proposed method. Finally, we illustrate a timely application of the proposed method for the development of risk prediction models for five common diseases using the UK Biobank study, combining baseline information from all study participants (500K) and recently released high-throughout proteomic data (# protein = 1500) on a subset (50K) of the participants.

Related papers

Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z)
Modelling higher education dropouts using sparse and interpretable post-clustering logistic regression [0.8437187555622164]
Higher education dropout constitutes a critical challenge for tertiary education systems worldwide.<n>The model introduced in this paper is a specialized form of logistic regression, specifically adapted to the context of university dropout analysis.
arXiv Detail & Related papers (2025-05-12T14:05:23Z)
Transfer Learning for High-dimensional Reduced Rank Time Series Models [0.0]
We focus on transfer learning for sequences of observations with temporal dependencies and a more intricate model parameter structure. We propose a new transfer learning algorithm tailored for estimating high-dimensional VAR models characterized by low-rank and sparse structures.
arXiv Detail & Related papers (2025-04-22T08:15:59Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues. We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space. A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z)
Towards Biologically Plausible and Private Gene Expression Data Generation [47.72947816788821]
Generative models trained with Differential Privacy (DP) are becoming increasingly prominent in the creation of synthetic data for downstream applications. Existing literature, however, primarily focuses on basic benchmarking datasets and tends to report promising results only for elementary metrics and relatively simple data distributions. We initiate a systematic analysis of how DP generative models perform in their natural application scenarios, specifically focusing on real-world gene expression data.
arXiv Detail & Related papers (2024-02-07T14:39:11Z)
Hypothesis Testing using Causal and Causal Variational Generative Models [0.0]
Causal Gen and Causal Variational Gen can utilize nonparametric structural causal knowledge combined with a deep learning functional approximation. We show how, using a deliberate (non-random) split of training and testing data, these models can generalize better to similar, but out-of-distribution data points. We validate our methods on a synthetic pendulum dataset, as well as a trauma surgery ground level fall dataset.
arXiv Detail & Related papers (2022-10-20T13:46:15Z)
SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in Fine-tuned Source Code Models [58.78043959556283]
We study the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods. Our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.
arXiv Detail & Related papers (2022-10-10T16:07:24Z)
A Physics-Guided Neural Operator Learning Approach to Model Biological Tissues from Digital Image Correlation Measurements [3.65211252467094]
We present a data-driven correlation to biological tissue modeling, which aims to predict the displacement field based on digital image correlation (DIC) measurements under unseen loading scenarios. A material database is constructed from the DIC displacement tracking measurements of multiple biaxial stretching protocols on a porcine tricuspid valve leaflet. The material response is modeled as a solution operator from the loading to the resultant displacement field, with the material properties learned implicitly from the data and naturally embedded in the network parameters.
arXiv Detail & Related papers (2022-04-01T04:56:41Z)
Mixed Effects Neural ODE: A Variational Approximation for Analyzing the Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data. We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem. We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z)
An Optimal Control Approach to Learning in SIDARTHE Epidemic model [67.22168759751541]
We propose a general approach for learning time-variant parameters of dynamic compartmental models from epidemic data. We forecast the epidemic evolution in Italy and France.
arXiv Detail & Related papers (2020-10-28T10:58:59Z)
Goal-Directed Planning for Habituated Agents by Active Inference Using a Variational Recurrent Neural Network [5.000272778136268]
This study shows that the predictive coding (PC) and active inference (AIF) frameworks can develop better generalization by learning a prior distribution in a low dimensional latent state space. In our proposed model, learning is carried out by inferring optimal latent variables as well as synaptic weights for maximizing the evidence lower bound. Our proposed model was evaluated with both simple and complex robotic tasks in simulation, which demonstrated sufficient generalization in learning with limited training data.
arXiv Detail & Related papers (2020-05-27T06:43:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.