Protein language model rescue mutations highlight variant effects and
structure in clinically relevant genes
- URL: http://arxiv.org/abs/2211.10000v1
- Date: Fri, 18 Nov 2022 03:00:52 GMT
- Title: Protein language model rescue mutations highlight variant effects and
structure in clinically relevant genes
- Authors: Onuralp Soylemez and Pablo Cordero
- Abstract summary: We interrogate the use of protein language models in characterizing known pathogenic mutations in curated, medically actionable genes.
Systematic analysis of the predicted effects of these compensatory mutations reveal unappreciated structural features of proteins.
We encourage the community to generate and curate rescue mutation experiments to inform the design of more sophisticated co-masking strategies.
- Score: 1.7970523486905976
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite being self-supervised, protein language models have shown remarkable
performance in fundamental biological tasks such as predicting impact of
genetic variation on protein structure and function. The effectiveness of these
models on diverse set of tasks suggests that they learn meaningful
representations of fitness landscape that can be useful for downstream clinical
applications. Here, we interrogate the use of these language models in
characterizing known pathogenic mutations in curated, medically actionable
genes through an exhaustive search of putative compensatory mutations on each
variant's genetic background. Systematic analysis of the predicted effects of
these compensatory mutations reveal unappreciated structural features of
proteins that are missed by other structure predictors like AlphaFold. While
deep mutational scan experiments provide an unbiased estimate of the mutational
landscape, we encourage the community to generate and curate rescue mutation
experiments to inform the design of more sophisticated co-masking strategies
and leverage large language models more effectively for downstream clinical
prediction tasks.
Related papers
- Integrating Large Language Models for Genetic Variant Classification [12.244115429231888]
Large Language Models (LLMs) have emerged as transformative tools in genetics.
This study investigates the integration of state-of-the-art LLMs, including GPN-MSA, ESM1b, and AlphaMissense.
Our approach evaluates these integrated models using the well-annotated ProteinGym and ClinVar datasets.
arXiv Detail & Related papers (2024-11-07T13:45:56Z) - Retrieval-Enhanced Mutation Mastery: Augmenting Zero-Shot Prediction of Protein Language Model [3.4494754789770186]
Deep learning methods for protein modeling have demonstrated superior results at lower costs compared to traditional approaches.
In mutation effect prediction, the key to pre-training deep learning models lies in accurately interpreting the complex relationships among protein sequence, structure, and function.
This study introduces a retrieval-enhanced protein language model for comprehensive analysis of native properties from sequence and local structural interactions.
arXiv Detail & Related papers (2024-10-28T15:28:51Z) - HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction [0.0]
HERMES is a 3D rotationally equivariant structure-based neural network model for mutational effect and stability prediction.
We present a suite of HERMES models, pre-trained with different strategies, and fine-tuned to predict the stability effect of mutations.
arXiv Detail & Related papers (2024-07-09T09:31:05Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning [78.38442423223832]
We develop a novel codebook pre-training task, namely masked microenvironment modeling.
We demonstrate superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction.
arXiv Detail & Related papers (2024-05-16T03:53:21Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - Multi-level Protein Representation Learning for Blind Mutational Effect
Prediction [5.207307163958806]
This paper introduces a novel pre-training framework that cascades sequential and geometric analyzers for protein structures.
It guides mutational directions toward desired traits by simulating natural selection on wild-type proteins.
We assess the proposed approach using a public database and two new databases for a variety of variant effect prediction tasks.
arXiv Detail & Related papers (2023-06-08T03:00:50Z) - Accurate and Definite Mutational Effect Prediction with Lightweight
Equivariant Graph Neural Networks [2.381587712372268]
This research introduces a lightweight graph representation learning scheme that efficiently analyzes the microenvironment of wild-type proteins.
Our solution offers a wide range of benefits that make it an ideal choice for the community.
arXiv Detail & Related papers (2023-04-13T09:51:49Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.