MutaGAN: A Seq2seq GAN Framework to Predict Mutations of Evolving
Protein Populations
- URL: http://arxiv.org/abs/2008.11790v1
- Date: Wed, 26 Aug 2020 20:20:30 GMT
- Title: MutaGAN: A Seq2seq GAN Framework to Predict Mutations of Evolving
Protein Populations
- Authors: Daniel S. Berman (1), Craig Howser (1), Thomas Mehoke (1), Jared D.
Evans (1) ((1) Johns Hopkins Applied Physics Laboratory, Laurel, United
States)
- Abstract summary: Influenza virus sequences were identified as an ideal test case for this deep learning framework.
MutaGAN generated "child" sequences from a given "parent" protein sequence with a median Levenshtein distance of 2.00 amino acids.
Results demonstrate the power of the MutaGAN framework to aid in pathogen forecasting with implications for broad utility in evolutionary prediction for any protein population.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The ability to predict the evolution of a pathogen would significantly
improve the ability to control, prevent, and treat disease. Despite significant
progress in other problem spaces, deep learning has yet to contribute to the
issue of predicting mutations of evolving populations. To address this gap, we
developed a novel machine learning framework using generative adversarial
networks (GANs) with recurrent neural networks (RNNs) to accurately predict
genetic mutations and evolution of future biological populations. Using a
generalized time-reversible phylogenetic model of protein evolution with
bootstrapped maximum likelihood tree estimation, we trained a
sequence-to-sequence generator within an adversarial framework, named MutaGAN,
to generate complete protein sequences augmented with possible mutations of
future virus populations. Influenza virus sequences were identified as an ideal
test case for this deep learning framework because it is a significant human
pathogen with new strains emerging annually and global surveillance efforts
have generated a large amount of publicly available data from the National
Center for Biotechnology Information's (NCBI) Influenza Virus Resource (IVR).
MutaGAN generated "child" sequences from a given "parent" protein sequence with
a median Levenshtein distance of 2.00 amino acids. Additionally, the generator
was able to augment the majority of parent proteins with at least one mutation
identified within the global influenza virus population. These results
demonstrate the power of the MutaGAN framework to aid in pathogen forecasting
with implications for broad utility in evolutionary prediction for any protein
population.
Related papers
- GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.
The model adheres to the central dogma of molecular biology, accurately generating protein-coding sequences.
It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of promoter sequences.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - A Simple yet Effective DDG Predictor is An Unsupervised Antibody Optimizer and Explainer [53.85265022754878]
We propose a lightweight DDG predictor (Light-DDG) for fast mutation screening.
We also release a large-scale dataset containing millions of mutation data for pre-training Light-DDG.
For the target antibody, we propose a novel Mutation Explainer to learn mutation preferences.
arXiv Detail & Related papers (2025-02-10T09:26:57Z) - Multi-megabase scale genome interpretation with genetic language models [45.97370115519009]
Phenformer is a multi-scale genetic language model that learns to generate mechanistic hypotheses.
Using whole genome sequencing data from more than 150 000 individuals, we show that Phenformer generates mechanistic hypotheses better than existing methods.
arXiv Detail & Related papers (2025-01-13T23:00:40Z) - Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning [78.38442423223832]
We develop a novel codebook pre-training task, namely masked microenvironment modeling.
We demonstrate superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction.
arXiv Detail & Related papers (2024-05-16T03:53:21Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - Unsupervised language models for disease variant prediction [3.6942566104432886]
We find that a single protein LM trained on broad sequence datasets can score pathogenicity for any gene variant zero-shot.
We show that it achieves scoring performance comparable to the state of the art when evaluated on clinically labeled variants of disease-related genes.
arXiv Detail & Related papers (2022-12-07T22:28:13Z) - Scalable Pathogen Detection from Next Generation DNA Sequencing with
Deep Learning [3.8175773487333857]
We propose MG2Vec, a deep learning-based solution that uses the transformer network as its backbone.
We show that the proposed approach can help detect pathogens from uncurated, real-world clinical samples.
We provide a comprehensive evaluation of a novel representation learning framework for metagenome-based disease diagnostics with deep learning.
arXiv Detail & Related papers (2022-11-30T00:13:59Z) - PhyloTransformer: A Discriminative Model for Mutation Prediction Based
on a Multi-head Self-attention Mechanism [10.468453827172477]
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused an ongoing pandemic infecting 219 million people as of 10/19/21, with a 3.6% mortality rate.
Here we developed PhyloTransformer, a Transformer-based discriminative model that engages a multi-head self-attention mechanism to model genetic mutations that may lead to viral reproductive advantage.
arXiv Detail & Related papers (2021-11-03T01:30:57Z) - Classification of Influenza Hemagglutinin Protein Sequences using
Convolutional Neural Networks [8.397189036839956]
This paper focuses on accurately predicting if an Influenza type A virus can infect specific hosts, and more specifically, Human, Avian and Swine hosts, using only the protein sequence of the HA gene.
We propose encoding the protein sequences into numerical signals using the Hydrophobicity Index and subsequently utilising a Convolutional Neural Network-based predictive model.
As the results show, the proposed model can distinguish HA protein sequences with high accuracy whenever the virus under investigation can infect Human, Avian or Swine hosts.
arXiv Detail & Related papers (2021-08-09T10:42:26Z) - Epigenetic evolution of deep convolutional models [81.21462458089142]
We build upon a previously proposed neuroevolution framework to evolve deep convolutional models.
We propose a convolutional layer layout which allows kernels of different shapes and sizes to coexist within the same layer.
The proposed layout enables the size and shape of individual kernels within a convolutional layer to be evolved with a corresponding new mutation operator.
arXiv Detail & Related papers (2021-04-12T12:45:16Z) - Modelling SARS-CoV-2 coevolution with genetic algorithms [0.0]
SARS-CoV-2 outbreak shook policy responses to the emergence of virus variants.
We propose coevolution with genetic algorithms (GAs) as a credible approach to model this relationship.
We present a dual GA model in which both viruses aiming for survival and policy measures aiming at minimising infection rates, competitively evolve.
arXiv Detail & Related papers (2021-02-24T15:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.