The Dance of Atoms-De Novo Protein Design with Diffusion Model
- URL: http://arxiv.org/abs/2504.16479v1
- Date: Wed, 23 Apr 2025 07:45:00 GMT
- Title: The Dance of Atoms-De Novo Protein Design with Diffusion Model
- Authors: Yujie Qin, Ming He, Changyong Yu, Ming Ni, Xian Liu, Xiaochen Bo,
- Abstract summary: De novo protein design refers to creating proteins with specific structures and functions that do not naturally exist.<n>In recent years, the accumulation of high-quality protein structure and sequence data and technological advancements have paved the way for the successful application of generative artificial intelligence (AI) models in protein design.<n>Among various generative AI models, diffusion models have yielded the most promising results in protein design.
- Score: 9.521115574728915
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The de novo design of proteins refers to creating proteins with specific structures and functions that do not naturally exist. In recent years, the accumulation of high-quality protein structure and sequence data and technological advancements have paved the way for the successful application of generative artificial intelligence (AI) models in protein design. These models have surpassed traditional approaches that rely on fragments and bioinformatics. They have significantly enhanced the success rate of de novo protein design, and reduced experimental costs, leading to breakthroughs in the field. Among various generative AI models, diffusion models have yielded the most promising results in protein design. In the past two to three years, more than ten protein design models based on diffusion models have emerged. Among them, the representative model, RFDiffusion, has demonstrated success rates in 25 protein design tasks that far exceed those of traditional methods, and other AI-based approaches like RFjoint and hallucination. This review will systematically examine the application of diffusion models in generating protein backbones and sequences. We will explore the strengths and limitations of different models, summarize successful cases of protein design using diffusion models, and discuss future development directions.
Related papers
- An All-Atom Generative Model for Designing Protein Complexes [49.09672038729524]
APM (All-Atom Protein Generative Model) is a model specifically designed for modeling multi-chain proteins.<n>By integrating atom-level information and leveraging data on multi-chain proteins, APM is capable of precisely modeling inter-chain interactions and designing protein complexes with binding capabilities from scratch.
arXiv Detail & Related papers (2025-04-17T16:37:41Z) - UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion [61.690978792873196]
Existing approaches rely on either autoregressive sequence models or diffusion models.
We propose UniGenX, a unified framework that combines autoregressive next-token prediction with conditional diffusion models.
We validate the effectiveness of UniGenX on material and small molecule generation tasks.
arXiv Detail & Related papers (2025-03-09T16:43:07Z) - A Model-Centric Review of Deep Learning for Protein Design [0.0]
Deep learning has transformed protein design, enabling accurate structure prediction, sequence optimization, and de novo protein generation.<n>Generative models such as ProtGPT2, ProteinMPNN, and RFdiffusion have enabled sequence and backbone design beyond natural evolution-based limitations.<n>More recently, joint sequence-structure co-design models, including ESM3, have integrated both modalities into a unified framework, resulting in improved designability.
arXiv Detail & Related papers (2025-02-26T14:31:21Z) - Computational Protein Science in the Era of Large Language Models (LLMs) [54.35488233989787]
Computational protein science is dedicated to revealing knowledge and developing applications within the protein sequence-structure-function paradigm.
Recently, Language Models (pLMs) have emerged as a milestone in AI due to their unprecedented language processing & generalization capability.
arXiv Detail & Related papers (2025-01-17T16:21:18Z) - From thermodynamics to protein design: Diffusion models for biomolecule generation towards autonomous protein engineering [8.173909751137888]
We first give the definition and characteristics of diffusion models and then focus on two strategies: Denoising Diffusion Probabilistic Models and Score-based Generative Models.<n>We discuss their applications in protein design, peptide generation, drug discovery, and protein-ligand interaction.
arXiv Detail & Related papers (2025-01-05T22:36:43Z) - SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - Protein Conformation Generation via Force-Guided SE(3) Diffusion Models [48.48934625235448]
Deep generative modeling techniques have been employed to generate novel protein conformations.
We propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation.
arXiv Detail & Related papers (2024-03-21T02:44:08Z) - Progressive Multi-Modality Learning for Inverse Protein Folding [47.095862120116976]
We propose a novel protein design paradigm called MMDesign, which leverages multi-modality transfer learning.
MMDesign is the first framework that combines a pretrained structural module with a pretrained contextual module, using an auto-encoder (AE) based language model to incorporate prior protein semantic knowledge.
Experimental results, only training with the small dataset, demonstrate that MMDesign consistently outperforms baselines on various public benchmarks.
arXiv Detail & Related papers (2023-12-11T10:59:23Z) - Generative artificial intelligence for de novo protein design [1.2021565114959365]
Generative architectures seem adept at generating novel, yet realistic proteins.
Design protocols now achieve experimental success rates nearing 20%.
Despite extensive progress, there are clear field-wide challenges.
arXiv Detail & Related papers (2023-10-15T00:02:22Z) - A Latent Diffusion Model for Protein Structure Generation [50.74232632854264]
We propose a latent diffusion model that can reduce the complexity of protein modeling.
We show that our method can effectively generate novel protein backbone structures with high designability and efficiency.
arXiv Detail & Related papers (2023-05-06T19:10:19Z) - ProGen2: Exploring the Boundaries of Protein Language Models [15.82416400246896]
We introduce a suite of protein language models, named ProGen2, that are scaled up to 6.4B parameters.
ProGen2 models show state-of-the-art performance in capturing the distribution of observed evolutionary sequences.
As large model sizes and raw numbers of protein sequences continue to become more widely accessible, our results suggest that a growing emphasis needs to be placed on the data distribution provided to a protein sequence model.
arXiv Detail & Related papers (2022-06-27T17:55:02Z) - Deep Generative Modeling for Protein Design [0.0]
Deep learning approaches have produced breakthroughs in fields such as image classification and natural language processing.
generative models of proteins have been developed that encompass all known protein sequences, model specific protein families, or extrapolate the dynamics of individual proteins.
We discuss five classes of generative models that have been most successful at modeling proteins and provide a framework for model guided protein design.
arXiv Detail & Related papers (2021-08-31T14:38:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.