Related papers: ALPHAGMUT: A Rationale-Guided Alpha Shape Graph Neural Network to Evaluate Mutation Effects

ALPHAGMUT: A Rationale-Guided Alpha Shape Graph Neural Network to Evaluate Mutation Effects

URL: http://arxiv.org/abs/2406.09159v1
Date: Thu, 13 Jun 2024 14:22:12 GMT
Title: ALPHAGMUT: A Rationale-Guided Alpha Shape Graph Neural Network to Evaluate Mutation Effects
Authors: Boshen Wang, Bowei Ye, Lin Xu, Jie Liang,
Abstract summary: In this study, we introduce a novel rationale-guided graph neural network AlphaGMut to evaluate mutation effects. We compute structural-, topological-, biophysical-, and sequence properties of the mutation sites, which are assigned as node attributes in the graph. We demonstrate that AlphaGMut outperforms state-of-the-art methods, including DeepMind's AlphaMissense, in many performance metrics.
Score: 8.331322657310292
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In silico methods evaluating the mutation effects of missense mutations are providing an important approach for understanding mutations in personal genomes and identifying disease-relevant biomarkers. However, existing methods, including deep learning methods, heavily rely on sequence-aware information, and do not fully leverage the potential of available 3D structural information. In addition, these methods may exhibit an inability to predict mutations in domains difficult to formulate sequence-based embeddings. In this study, we introduce a novel rationale-guided graph neural network AlphaGMut to evaluate mutation effects and to distinguish pathogenic mutations from neutral mutations. We compute the alpha shapes of protein structures to obtain atomic-resolution edge connectivities and map them to an accurate residue-level graph representation. We then compute structural-, topological-, biophysical-, and sequence properties of the mutation sites, which are assigned as node attributes in the graph. These node attributes could effectively guide the graph neural network to learn the difference between pathogenic and neutral mutations using k-hop message passing with a short training period. We demonstrate that AlphaGMut outperforms state-of-the-art methods, including DeepMind's AlphaMissense, in many performance metrics. In addition, AlphaGMut has the advantage of performing well in alignment-free settings, which provides broader prediction coverage and better generalization compared to current methods requiring deep sequence-aware information.

Related papers

EnTao-GPM: DNA Foundation Model for Predicting the Germline Pathogenic Mutations [16.32431932781823]
Cross-species targeted pre-training on disease-relevant mammalian genomes (human, pig, mouse)<n> Germline mutation specialization via fine-tuning on ClinVar and HGMD.<n>Interpretable clinical framework integrating DNA sequence embeddings with LLM-based statistical explanations.
arXiv Detail & Related papers (2025-07-29T11:34:41Z)
GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype [51.58774936662233]
Building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations.<n>In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data.<n>We introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes.
arXiv Detail & Related papers (2025-05-06T03:35:24Z)
GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters. Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks. It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z)
Stacked ensemble\-based mutagenicity prediction model using multiple modalities with graph attention network [0.9736758288065405]
Mutagenicity is a concern due to its association with genetic mutations which can result in a variety of negative consequences. In this work, we introduce a novel stacked ensemble based mutagenicity prediction model.
arXiv Detail & Related papers (2024-09-03T09:14:21Z)
Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances. BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules. BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z)
VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning. By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z)
Efficient and Scalable Fine-Tune of Language Models for Genome Understanding [49.606093223945734]
We present textscLingo: textscLanguage prefix ftextscIne-tuning for textscGentextscOmes. Unlike DNA foundation models, textscLingo strategically leverages natural language foundation models' contextual cues. textscLingo further accommodates numerous downstream fine-tune tasks by an adaptive rank sampling method.
arXiv Detail & Related papers (2024-02-12T21:40:45Z)
Predicting loss-of-function impact of genetic mutations: a machine learning approach [0.0]
This paper aims to train machine learning models on the attributes of a genetic mutation to predict LoFtool scores. These attributes included, but were not limited to, the position of a mutation on a chromosome, changes in amino acids, and changes in codons caused by the mutation. Models were evaluated using five-fold cross-validated averages of r-squared, mean squared error, root mean squared error, mean absolute error, and explained variance.
arXiv Detail & Related papers (2024-01-26T19:27:38Z)
Accurate and Definite Mutational Effect Prediction with Lightweight Equivariant Graph Neural Networks [2.381587712372268]
This research introduces a lightweight graph representation learning scheme that efficiently analyzes the microenvironment of wild-type proteins. Our solution offers a wide range of benefits that make it an ideal choice for the community.
arXiv Detail & Related papers (2023-04-13T09:51:49Z)
Domain Invariant Model with Graph Convolutional Network for Mammogram Classification [49.691629817104925]
We propose a novel framework, namely Domain Invariant Model with Graph Convolutional Network (DIM-GCN) We first propose a Bayesian network, which explicitly decomposes the latent variables into disease-related and other disease-irrelevant parts that are provable to be disentangled from each other. To better capture the macroscopic features, we leverage the observed clinical attributes as a goal for reconstruction, via Graph Convolutional Network (GCN)
arXiv Detail & Related papers (2022-04-21T08:23:44Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Self-Supervised Graph Representation Learning for Neuronal Morphologies [75.38832711445421]
We present GraphDINO, a data-driven approach to learn low-dimensional representations of 3D neuronal morphologies from unlabeled datasets. We show, in two different species and across multiple brain areas, that this method yields morphological cell type clusterings on par with manual feature-based classification by experts. Our method could potentially enable data-driven discovery of novel morphological features and cell types in large-scale datasets.
arXiv Detail & Related papers (2021-12-23T12:17:47Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
A Cross-Level Information Transmission Network for Predicting Phenotype from New Genotype: Application to Cancer Precision Medicine [37.442717660492384]
We propose a novel Cross-LEvel Information Transmission network (CLEIT) framework. Inspired by domain adaptation, CLEIT first learns the latent representation of high-level domain then uses it as ground-truth embedding. We demonstrate the effectiveness and performance boost of CLEIT in predicting anti-cancer drug sensitivity from somatic mutations.
arXiv Detail & Related papers (2020-10-09T22:01:00Z)
Pre-training of Graph Neural Network for Modeling Effects of Mutations on Protein-Protein Binding Affinity [13.293231874102641]
We develop a novel deep learning based framework, named GraphPPI, to predict the binding affinity changes upon mutations based on the features provided by a graph neural network (GNN) Experiments showed that, without any annotated signals, GraphPPI can capture meaningful patterns of the protein structures. In-depth analyses also showed GraphPPI can accurately estimate the effects of mutations on the binding affinity between SARS-CoV-2 and its antibodies.
arXiv Detail & Related papers (2020-08-28T04:07:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.