An All-Atom Generative Model for Designing Protein Complexes
- URL: http://arxiv.org/abs/2504.13075v1
- Date: Thu, 17 Apr 2025 16:37:41 GMT
- Title: An All-Atom Generative Model for Designing Protein Complexes
- Authors: Ruizhe Chen, Dongyu Xue, Xiangxin Zhou, Zaixiang Zheng, Xiangxiang Zeng, Quanquan Gu,
- Abstract summary: APM (All-Atom Protein Generative Model) is a model specifically designed for modeling multi-chain proteins.<n>By integrating atom-level information and leveraging data on multi-chain proteins, APM is capable of precisely modeling inter-chain interactions and designing protein complexes with binding capabilities from scratch.
- Score: 49.09672038729524
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Proteins typically exist in complexes, interacting with other proteins or biomolecules to perform their specific biological roles. Research on single-chain protein modeling has been extensively and deeply explored, with advancements seen in models like the series of ESM and AlphaFold. Despite these developments, the study and modeling of multi-chain proteins remain largely uncharted, though they are vital for understanding biological functions. Recognizing the importance of these interactions, we introduce APM (All-Atom Protein Generative Model), a model specifically designed for modeling multi-chain proteins. By integrating atom-level information and leveraging data on multi-chain proteins, APM is capable of precisely modeling inter-chain interactions and designing protein complexes with binding capabilities from scratch. It also performs folding and inverse-folding tasks for multi-chain proteins. Moreover, APM demonstrates versatility in downstream applications: it achieves enhanced performance through supervised fine-tuning (SFT) while also supporting zero-shot sampling in certain tasks, achieving state-of-the-art results. Code will be released at https://github.com/bytedance/apm.
Related papers
- Computational Protein Science in the Era of Large Language Models (LLMs) [54.35488233989787]
Computational protein science is dedicated to revealing knowledge and developing applications within the protein sequence-structure-function paradigm.<n>Recently, Language Models (pLMs) have emerged as a milestone in AI due to their unprecedented language processing & generalization capability.
arXiv Detail & Related papers (2025-01-17T16:21:18Z) - OneProt: Towards Multi-Modal Protein Foundation Models [5.440531199006399]
We introduce OneProt, a multi-modal AI for proteins that integrates structural, sequence, alignment, and binding site data.
It surpasses state-of-the-art methods in various downstream tasks, including metal ion binding classification, gene-ontology annotation, and enzyme function prediction.
This work expands multi-modal capabilities in protein models, paving the way for applications in drug discovery, biocatalytic reaction planning, and protein engineering.
arXiv Detail & Related papers (2024-11-07T16:54:54Z) - SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - Long-context Protein Language Modeling Using Bidirectional Mamba with Shared Projection Layers [76.95505296417866]
Self-supervised training of language models (LMs) has seen great success for protein sequences in learning meaningful representations and for generative drug design.<n>Most protein LMs are based on the Transformer architecture trained on individual proteins with short context lengths.<n>In this work, we propose LC-PLM based on an alternative protein LM architecture, BiMamba-S, built upon selective structured state-space models.
arXiv Detail & Related papers (2024-10-29T16:43:28Z) - Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions [4.36852565205713]
We present our work training the largest open-source multi-omic foundation model to date.<n>We show that these multi-omic models can learn joint representations between various single-omic distributions.<n>We also demonstrate that MOMs can be fine-tuned to achieve state-of-the-art results on protein-nucleic acid interaction tasks.
arXiv Detail & Related papers (2024-08-29T03:56:40Z) - ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases.
Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions.
We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z) - ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling [32.656601823957345]
ESM-AA (ESM All-Atom) is a novel approach that enables atom-scale and residue-scale unified molecular modeling.
Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks.
arXiv Detail & Related papers (2024-03-05T13:35:41Z) - ProtAgents: Protein discovery via large language model multi-agent
collaborations combining physics and machine learning [0.0]
ProtAgents is a platform for de novo protein design based on Large Language Models (LLMs)
Multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment.
The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment unleashes great potentials.
arXiv Detail & Related papers (2024-01-27T20:19:49Z) - Functional Geometry Guided Protein Sequence and Backbone Structure
Co-Design [12.585697288315846]
We propose a model to jointly design Protein sequence and structure based on automatically detected functional sites.
NAEPro is powered by an interleaving network of attention and equivariant layers, which can capture global correlation in a whole sequence.
Experimental results show that our model consistently achieves the highest amino acid recovery rate, TM-score, and the lowest RMSD among all competitors.
arXiv Detail & Related papers (2023-10-06T16:08:41Z) - A Latent Diffusion Model for Protein Structure Generation [50.74232632854264]
We propose a latent diffusion model that can reduce the complexity of protein modeling.
We show that our method can effectively generate novel protein backbone structures with high designability and efficiency.
arXiv Detail & Related papers (2023-05-06T19:10:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.