Extending Models Via Gradient Boosting: An Application to Mendelian
Models
- URL: http://arxiv.org/abs/2105.06559v1
- Date: Thu, 13 May 2021 21:21:05 GMT
- Title: Extending Models Via Gradient Boosting: An Application to Mendelian
Models
- Authors: Theodore Huang, Gregory Idos, Christine Hong, Stephen Gruber, Giovanni
Parmigiani, Danielle Braun
- Abstract summary: We propose a general approach to model improvement: we combine gradient boosting with any previously developed model to improve model performance.
We show that integration of gradient boosting with an existing Mendelian model can produce an improved model that outperforms both that model and the model built using gradient boosting alone.
- Score: 1.9573380763700712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improving existing widely-adopted prediction models is often a more efficient
and robust way towards progress than training new models from scratch. Existing
models may (a) incorporate complex mechanistic knowledge, (b) leverage
proprietary information and, (c) have surmounted barriers to adoption. Compared
to model training, model improvement and modification receive little attention.
In this paper we propose a general approach to model improvement: we combine
gradient boosting with any previously developed model to improve model
performance while retaining important existing characteristics. To exemplify,
we consider the context of Mendelian models, which estimate the probability of
carrying genetic mutations that confer susceptibility to disease by using
family pedigrees and health histories of family members. Via simulations we
show that integration of gradient boosting with an existing Mendelian model can
produce an improved model that outperforms both that model and the model built
using gradient boosting alone. We illustrate the approach on genetic testing
data from the USC-Stanford Cancer Genetics Hereditary Cancer Panel (HCP) study.
Related papers
- Unified Molecule Generation and Property Prediction [6.865957689890204]
Hyformer is a transformer-based joint model that blends the generative and predictive functionalities.
We show that Hyformer rivals other joint models, as well as state-of-the-art molecule generation and property prediction models.
arXiv Detail & Related papers (2025-04-23T09:36:46Z) - Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer [1.5416321520529301]
Efficient Knowledge Adaptation (PEKA) is a novel framework that integrates knowledge distillation and structure alignment losses for cross-modal knowledge transfer.
We evaluated PEKA for gene expression prediction using multiple spatial transcriptomics datasets.
arXiv Detail & Related papers (2025-04-09T17:24:41Z) - Guiding Time-Varying Generative Models with Natural Gradients on Exponential Family Manifold [5.000311680307273]
We show that the evolution of time-varying generative models can be projected onto an exponential family manifold.
We then train the generative model by moving its projection on the manifold according to the natural gradient descent scheme.
We propose particle versions of the algorithm, which feature closed-form update rules for any parametric model within the exponential family.
arXiv Detail & Related papers (2025-02-11T15:39:47Z) - Integrating Large Language Models for Genetic Variant Classification [12.244115429231888]
Large Language Models (LLMs) have emerged as transformative tools in genetics.
This study investigates the integration of state-of-the-art LLMs, including GPN-MSA, ESM1b, and AlphaMissense.
Our approach evaluates these integrated models using the well-annotated ProteinGym and ClinVar datasets.
arXiv Detail & Related papers (2024-11-07T13:45:56Z) - Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Exploring Model Kinship for Merging Large Language Models [52.01652098827454]
We introduce model kinship, the degree of similarity or relatedness between Large Language Models.
We find that there is a certain relationship between model kinship and the performance gains after model merging.
We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can yield better performance on benchmark datasets.
arXiv Detail & Related papers (2024-10-16T14:29:29Z) - Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL.
We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z) - MGE: A Training-Free and Efficient Model Generation and Enhancement
Scheme [10.48591131837771]
This paper proposes a Training-Free and Efficient Model Generation and Enhancement Scheme (MGE)
It considers two aspects during the model generation process: the distribution of model parameters and model performance.
Experiments result shows that generated models are comparable to models obtained through normal training, and even superior in some cases.
arXiv Detail & Related papers (2024-02-27T13:12:00Z) - Improved prediction of ligand-protein binding affinities by meta-modeling [1.3859669037499769]
We develop a framework to integrate published force-field-based empirical docking and sequence-based deep learning models.
We show that many of our meta-models significantly improve affinity predictions over base models.
Our best meta-models achieve comparable performance to state-of-the-art deep learning tools exclusively based on 3D structures.
arXiv Detail & Related papers (2023-10-05T23:46:45Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Hybrid modeling: Applications in real-time diagnosis [64.5040763067757]
We outline a novel hybrid modeling approach that combines machine learning inspired models and physics-based models.
We are using such models for real-time diagnosis applications.
arXiv Detail & Related papers (2020-03-04T00:44:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.