ForceGen: End-to-end de novo protein generation based on nonlinear
mechanical unfolding responses using a protein language diffusion model
- URL: http://arxiv.org/abs/2310.10605v3
- Date: Sat, 16 Dec 2023 01:39:22 GMT
- Title: ForceGen: End-to-end de novo protein generation based on nonlinear
mechanical unfolding responses using a protein language diffusion model
- Authors: Bo Ni, David L. Kaplan, Markus J. Buehler
- Abstract summary: We report a generative model that predicts protein designs to meet complex nonlinear mechanical property-design objectives.
Our model leverages deep knowledge on protein sequences from a pre-trained protein language model and maps mechanical unfolding responses to create novel proteins.
Our model offers rapid pathways to explore the enormous mechanobiological protein sequence space unconstrained by biological synthesis.
- Score: 0.5678271181959529
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Through evolution, nature has presented a set of remarkable protein
materials, including elastins, silks, keratins and collagens with superior
mechanical performances that play crucial roles in mechanobiology. However,
going beyond natural designs to discover proteins that meet specified
mechanical properties remains challenging. Here we report a generative model
that predicts protein designs to meet complex nonlinear mechanical
property-design objectives. Our model leverages deep knowledge on protein
sequences from a pre-trained protein language model and maps mechanical
unfolding responses to create novel proteins. Via full-atom molecular
simulations for direct validation, we demonstrate that the designed proteins
are novel, and fulfill the targeted mechanical properties, including unfolding
energy and mechanical strength, as well as the detailed unfolding
force-separation curves. Our model offers rapid pathways to explore the
enormous mechanobiological protein sequence space unconstrained by biological
synthesis, using mechanical features as target to enable the discovery of
protein materials with superior mechanical properties.
Related papers
- Agentic End-to-End De Novo Protein Design for Tailored Dynamics Using a Language Diffusion Model [0.5678271181959529]
VibeGen is a generative AI framework that enables end-to-end de novo protein design conditioned on normal mode vibrations.
Our work integrates protein dynamics into generative protein design, and establishes a direct, bidirectional link between sequence and vibrational behavior.
arXiv Detail & Related papers (2025-02-14T14:07:54Z) - Computational Protein Science in the Era of Large Language Models (LLMs) [54.35488233989787]
Computational protein science is dedicated to revealing knowledge and developing applications within the protein sequence-structure-function paradigm.
Recently, Language Models (pLMs) have emerged as a milestone in AI due to their unprecedented language processing & generalization capability.
arXiv Detail & Related papers (2025-01-17T16:21:18Z) - Long-context Protein Language Model [76.95505296417866]
Self-supervised training of language models (LMs) has seen great success for protein sequences in learning meaningful representations and for generative drug design.
Most protein LMs are based on the Transformer architecture trained on individual proteins with short context lengths.
We propose LC-PLM based on an alternative protein LM architecture, BiMamba-S, built off selective structured state-space models.
We also introduce its graph-contextual variant, LC-PLM-G, which contextualizes protein-protein interaction graphs for a second stage of training.
arXiv Detail & Related papers (2024-10-29T16:43:28Z) - NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks [60.48306899271866]
We propose novel semantic data augmentation methods to incorporate backbone chemical and side-chain biophysical information into protein classification tasks.
Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, andionic features of proteins to facilitate classification tasks.
arXiv Detail & Related papers (2024-03-21T13:27:57Z) - A Latent Diffusion Model for Protein Structure Generation [50.74232632854264]
We propose a latent diffusion model that can reduce the complexity of protein modeling.
We show that our method can effectively generate novel protein backbone structures with high designability and efficiency.
arXiv Detail & Related papers (2023-05-06T19:10:19Z) - Learning the shape of protein micro-environments with a holographic
convolutional neural network [0.0]
We introduce Holographic Convolutional Neural Network (H-CNN) for proteins.
H-CNN is a physically motivated machine learning approach to model amino acid preferences in protein structures.
It accurately predicts the impact of mutations on protein function, including stability and binding of protein complexes.
arXiv Detail & Related papers (2022-11-05T16:29:15Z) - Protein Structure and Sequence Generation with Equivariant Denoising
Diffusion Probabilistic Models [3.5450828190071646]
An important task in bioengineering is designing proteins with specific 3D structures and chemical properties which enable targeted functions.
We introduce a generative model of both protein structure and sequence that can operate at significantly larger scales than previous molecular generative modeling approaches.
arXiv Detail & Related papers (2022-05-26T16:10:09Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Deep Generative Modeling for Protein Design [0.0]
Deep learning approaches have produced breakthroughs in fields such as image classification and natural language processing.
generative models of proteins have been developed that encompass all known protein sequences, model specific protein families, or extrapolate the dynamics of individual proteins.
We discuss five classes of generative models that have been most successful at modeling proteins and provide a framework for model guided protein design.
arXiv Detail & Related papers (2021-08-31T14:38:26Z) - Energy-based models for atomic-resolution protein conformations [88.68597850243138]
We propose an energy-based model (EBM) of protein conformations that operates at atomic scale.
The model is trained solely on crystallized protein data.
An investigation of the model's outputs and hidden representations finds that it captures physicochemical properties relevant to protein energy.
arXiv Detail & Related papers (2020-04-27T20:45:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.