Survey and Improvement Strategies for Gene Prioritization with Large Language Models
- URL: http://arxiv.org/abs/2501.18794v1
- Date: Thu, 30 Jan 2025 23:03:03 GMT
- Title: Survey and Improvement Strategies for Gene Prioritization with Large Language Models
- Authors: Matthew Neeley, Guantong Qi, Guanchu Wang, Ruixiang Tang, Dongxue Mao, Chaozhong Liu, Sasidhar Pasupuleti, Bo Yuan, Fan Xia, Pengfei Liu, Zhandong Liu, Xia Hu,
- Abstract summary: Large language models (LLMs) have performed well in medical exams, but their effectiveness in diagnosing rare genetic diseases has not been assessed.
We used multi-agent and Human Phenotype Ontology (HPO) classification to categorized patients based on phenotypes and solvability levels.
At baseline, GPT-4 outperformed other LLMs, achieving near 30% accuracy in ranking causal genes correctly.
- Score: 61.24568051916653
- License:
- Abstract: Rare diseases are challenging to diagnose due to limited patient data and genetic diversity. Despite advances in variant prioritization, many cases remain undiagnosed. While large language models (LLMs) have performed well in medical exams, their effectiveness in diagnosing rare genetic diseases has not been assessed. To identify causal genes, we benchmarked various LLMs for gene prioritization. Using multi-agent and Human Phenotype Ontology (HPO) classification, we categorized patients based on phenotypes and solvability levels. As gene set size increased, LLM performance deteriorated, so we used a divide-and-conquer strategy to break the task into smaller subsets. At baseline, GPT-4 outperformed other LLMs, achieving near 30% accuracy in ranking causal genes correctly. The multi-agent and HPO approaches helped distinguish confidently solved cases from challenging ones, highlighting the importance of known gene-phenotype associations and phenotype specificity. We found that cases with specific phenotypes or clear associations were more accurately solved. However, we observed biases toward well-studied genes and input order sensitivity, which hindered gene prioritization. Our divide-and-conquer strategy improved accuracy by overcoming these biases. By utilizing HPO classification, novel multi-agent techniques, and our LLM strategy, we improved causal gene identification accuracy compared to our baseline evaluation. This approach streamlines rare disease diagnosis, facilitates reanalysis of unsolved cases, and accelerates gene discovery, supporting the development of targeted diagnostics and therapies.
Related papers
- Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - Advancing Gene Selection in Oncology: A Fusion of Deep Learning and
Sparsity for Precision Gene Selection [4.093503153499691]
This paper introduces two gene selection strategies for deep learning-based survival prediction models.
The first strategy uses a sparsity-inducing method while the second one uses importance based gene selection for identifying relevant genes.
arXiv Detail & Related papers (2024-03-04T10:44:57Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - Few-Shot Meta Learning for Recognizing Facial Phenotypes of Genetic
Disorders [55.41644538483948]
Automated classification and similarity retrieval aid physicians in decision-making to diagnose possible genetic conditions as early as possible.
Previous work has addressed the problem as a classification problem and used deep learning methods.
In this study, we used a facial recognition model trained on a large corpus of healthy individuals as a pre-task and transferred it to facial phenotype recognition.
arXiv Detail & Related papers (2022-10-23T11:52:57Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Expectile Neural Networks for Genetic Data Analysis of Complex Diseases [3.0088453915399747]
We develop an expectile neural network (ENN) method for genetic data analyses of complex diseases.
Similar to expectile regression, ENN provides a comprehensive view of relationships between genetic variants and disease phenotypes.
We show that the proposed method outperformed an existing expectile regression when there exist complex relationships between genetic variants and disease phenotypes.
arXiv Detail & Related papers (2020-10-26T21:07:40Z) - A Cross-Level Information Transmission Network for Predicting Phenotype
from New Genotype: Application to Cancer Precision Medicine [37.442717660492384]
We propose a novel Cross-LEvel Information Transmission network (CLEIT) framework.
Inspired by domain adaptation, CLEIT first learns the latent representation of high-level domain then uses it as ground-truth embedding.
We demonstrate the effectiveness and performance boost of CLEIT in predicting anti-cancer drug sensitivity from somatic mutations.
arXiv Detail & Related papers (2020-10-09T22:01:00Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.