Related papers: MutFormer: A context-dependent transformer-based model to predict pathogenic missense mutations

MutFormer: A context-dependent transformer-based model to predict pathogenic missense mutations

URL: http://arxiv.org/abs/2110.14746v1
Date: Wed, 27 Oct 2021 20:17:35 GMT
Title: MutFormer: A context-dependent transformer-based model to predict pathogenic missense mutations
Authors: Theodore Jiang, Li Fang, Kai Wang
Abstract summary: missense mutations account for approximately half of the known variants responsible for human inherited diseases. Recent advances in deep learning show that transformer models are particularly powerful at modeling sequences. We introduce MutFormer, a transformer-based model for prediction of pathogenic missense mutations.
Score: 5.153619184788929
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A missense mutation is a point mutation that results in a substitution of an amino acid in a protein sequence. Currently, missense mutations account for approximately half of the known variants responsible for human inherited diseases, but accurate prediction of the pathogenicity of missense variants is still challenging. Recent advances in deep learning show that transformer models are particularly powerful at modeling sequences. In this study, we introduce MutFormer, a transformer-based model for prediction of pathogenic missense mutations. We pre-trained MutFormer on reference protein sequences and alternative protein sequences result from common genetic variants. We tested different fine-tuning methods for pathogenicity prediction. Our results show that MutFormer outperforms a variety of existing tools. MutFormer and pre-computed variant scores are publicly available on GitHub at https://github.com/WGLab/mutformer.

Related papers

Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges [68.98973318553983]
We propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions.<n>We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way.<n>We also incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles.
arXiv Detail & Related papers (2025-06-26T09:05:38Z)
JanusDDG: A Thermodynamics-Compliant Model for Sequence-Based Protein Stability via Two-Fronts Multi-Head Attention [0.0]
understanding how residue variations affect protein stability is crucial for designing functional proteins. Recent advances in protein language models (PLMs) have revolutionized computational protein analysis. We introduce JanusDDG, a deep learning framework that leverages PLM-derived embeddings and a bidirectional cross-attention transformer architecture.
arXiv Detail & Related papers (2025-04-04T09:02:32Z)
A Simple yet Effective DDG Predictor is An Unsupervised Antibody Optimizer and Explainer [53.85265022754878]
We propose a lightweight DDG predictor (Light-DDG) for fast mutation screening. We also release a large-scale dataset containing millions of mutation data for pre-training Light-DDG. For the target antibody, we propose a novel Mutation Explainer to learn mutation preferences.
arXiv Detail & Related papers (2025-02-10T09:26:57Z)
Latent Mutants: A large-scale study on the Interplay between mutation testing and software evolution [2.1984302611206537]
We study the characteristics of what we call latent mutants, i.e., the mutants that are live in one version and killed in later revisions. We examine 131,308 mutants generated by Pitest on 13 open-source projects.
arXiv Detail & Related papers (2025-01-03T15:44:38Z)
MutaPLM: Protein Language Modeling for Mutation Explanation and Engineering [12.738902517872509]
MutaPLM is a unified framework for interpreting and navigating protein mutations with protein language models. MutaPLM introduces a protein delta network that captures explicit protein mutation representations within a unified feature space. MutaPLM excels at providing human-understandable explanations for mutational effects and prioritizing novel mutations with desirable properties.
arXiv Detail & Related papers (2024-10-30T12:05:51Z)
Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances. BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules. BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z)
Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning [78.38442423223832]
We develop a novel codebook pre-training task, namely masked microenvironment modeling. We demonstrate superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction.
arXiv Detail & Related papers (2024-05-16T03:53:21Z)
An Empirical Evaluation of Manually Created Equivalent Mutants [54.02049952279685]
Less than 10 % of manually created mutants are equivalent. Surprisingly, our findings indicate that a significant portion of developers struggle to accurately identify equivalent mutants.
arXiv Detail & Related papers (2024-04-14T13:04:10Z)
Predicting loss-of-function impact of genetic mutations: a machine learning approach [0.0]
This paper aims to train machine learning models on the attributes of a genetic mutation to predict LoFtool scores. These attributes included, but were not limited to, the position of a mutation on a chromosome, changes in amino acids, and changes in codons caused by the mutation. Models were evaluated using five-fold cross-validated averages of r-squared, mean squared error, root mean squared error, mean absolute error, and explained variance.
arXiv Detail & Related papers (2024-01-26T19:27:38Z)
Efficiently Predicting Protein Stability Changes Upon Single-point Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry. We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z)
Diversity-Measurable Anomaly Detection [106.07413438216416]
We propose Diversity-Measurable Anomaly Detection (DMAD) framework to enhance reconstruction diversity. PDM essentially decouples deformation from embedding and makes the final anomaly score more reliable.
arXiv Detail & Related papers (2023-03-09T05:52:42Z)
InForecaster: Forecasting Influenza Hemagglutinin Mutations Through the Lens of Anomaly Detection [3.5213888068272197]
anomaly detection (AD) is a well-established field in Machine Learning (ML) We propose to tackle this challenge through anomaly detection (AD) We conduct a large number of experiments on four publicly available datasets.
arXiv Detail & Related papers (2022-10-25T02:08:09Z)
rfPhen2Gen: A machine learning based association study of brain imaging phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs. SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest. Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z)
PhyloTransformer: A Discriminative Model for Mutation Prediction Based on a Multi-head Self-attention Mechanism [10.468453827172477]
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused an ongoing pandemic infecting 219 million people as of 10/19/21, with a 3.6% mortality rate. Here we developed PhyloTransformer, a Transformer-based discriminative model that engages a multi-head self-attention mechanism to model genetic mutations that may lead to viral reproductive advantage.
arXiv Detail & Related papers (2021-11-03T01:30:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.