Related papers: Generating Tertiary Protein Structures via an Interpretative Variational Autoencoder

Generating Tertiary Protein Structures via an Interpretative Variational Autoencoder

URL: http://arxiv.org/abs/2004.07119v2
Date: Wed, 16 Jun 2021 06:02:16 GMT
Title: Generating Tertiary Protein Structures via an Interpretative Variational Autoencoder
Authors: Xiaojie Guo, Yuanqi Du, Sivani Tadepalli, Liang Zhao, and Amarda Shehu
Abstract summary: This paper proposes and evaluates an alternative approach to generating functionally-relevant three-dimensional structures of a protein. A comprehensive evaluation of several deep architectures shows the promise of generative models in directly revealing the latent space for sampling novel tertiary structures.
Score: 16.554053012204182
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Much scientific enquiry across disciplines is founded upon a mechanistic treatment of dynamic systems that ties form to function. A highly visible instance of this is in molecular biology, where an important goal is to determine functionally-relevant forms/structures that a protein molecule employs to interact with molecular partners in the living cell. This goal is typically pursued under the umbrella of stochastic optimization with algorithms that optimize a scoring function. Research repeatedly shows that current scoring function, though steadily improving, correlate weakly with molecular activity. Inspired by recent momentum in generative deep learning, this paper proposes and evaluates an alternative approach to generating functionally-relevant three-dimensional structures of a protein. Though typically deep generative models struggle with highly-structured data, the work presented here circumvents this challenge via graph-generative models. A comprehensive evaluation of several deep architectures shows the promise of generative models in directly revealing the latent space for sampling novel tertiary structures, as well as in highlighting axes/factors that carry structural meaning and open the black box often associated with deep models. The work presented here is a first step towards interpretative, deep generative models becoming viable and informative complementary approaches to protein structure prediction.

Related papers

UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion [61.690978792873196]
Existing approaches rely on either autoregressive sequence models or diffusion models. We propose UniGenX, a unified framework that combines autoregressive next-token prediction with conditional diffusion models. We validate the effectiveness of UniGenX on material and small molecule generation tasks.
arXiv Detail & Related papers (2025-03-09T16:43:07Z)
GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters. Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks. It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z)
Multi-Scale Representation Learning for Protein Fitness Prediction [31.735234482320283]
Previous methods have primarily relied on self-supervised models trained on vast, unlabeled protein sequence or structure datasets. We introduce the Sequence-Structure-Surface Fitness (S3F) model - a novel multimodal representation learning framework that integrates protein features across several scales. Our approach combines sequence representations from a protein language model with Geometric Vector Perceptron networks encoding protein backbone and detailed surface topology.
arXiv Detail & Related papers (2024-12-02T04:28:10Z)
Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms. This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z)
SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models. It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features. Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z)
AUTODIFF: Autoregressive Diffusion Modeling for Structure-based Drug Design [16.946648071157618]
We propose a diffusion-based fragment-wise autoregressive generation model for structure-based drug design (SBDD) We design a novel molecule assembly strategy named conformal motif that preserves the conformation of local structures of molecules first. We then encode the interaction of the protein-ligand complex with an SE(3)-equivariant convolutional network and generate molecules motif-by-motif with diffusion modeling.
arXiv Detail & Related papers (2024-04-02T14:44:02Z)
Accelerating Inference in Molecular Diffusion Models with Latent Representations of Protein Structure [0.0]
Diffusion generative models operate directly on 3D molecular structures. We present a novel GNN-based architecture for learning latent representations of molecular structure. Our model achieves comparable performance to one with an all-atom protein representation while exhibiting a 3-fold reduction in inference time.
arXiv Detail & Related papers (2023-11-22T15:32:31Z)
A Systematic Study of Joint Representation Learning on Protein Sequences and Structures [38.94729758958265]
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions. Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge. Our study undertakes a comprehensive exploration of joint protein representation learning by integrating a state-of-the-art PLM with distinct structure encoders.
arXiv Detail & Related papers (2023-03-11T01:24:10Z)
Modeling Molecular Structures with Intrinsic Diffusion Models [2.487445341407889]
This thesis proposes Intrinsic Diffusion Modeling. It combines diffusion generative models with scientific knowledge about the flexibility of biological complexes. We demonstrate the effectiveness of this approach on two fundamental tasks at the basis of computational chemistry and biology.
arXiv Detail & Related papers (2023-02-23T03:26:48Z)
State-specific protein-ligand complex structure prediction with a multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures. Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z)
Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation. We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z)
Learning Geometrically Disentangled Representations of Protein Folding Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein. Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules. Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z)
Protein model quality assessment using rotation-equivariant, hierarchical neural networks [8.373439916313018]
We present a novel deep learning approach to assess the quality of a protein model. Our method achieves state-of-the-art results in scoring protein models submitted to recent rounds of CASP.
arXiv Detail & Related papers (2020-11-27T05:03:53Z)
BERTology Meets Biology: Interpreting Attention in Protein Language Models [124.8966298974842]
We demonstrate methods for analyzing protein Transformer models through the lens of attention. We show that attention captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure. We also present a three-dimensional visualization of the interaction between attention and protein structure.
arXiv Detail & Related papers (2020-06-26T21:50:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.