Protein-SE(3): Benchmarking SE(3)-based Generative Models for Protein Structure Design
- URL: http://arxiv.org/abs/2507.20243v1
- Date: Sun, 27 Jul 2025 11:53:05 GMT
- Title: Protein-SE(3): Benchmarking SE(3)-based Generative Models for Protein Structure Design
- Authors: Lang Yu, Zhangyang Gao, Cheng Tan, Qin Chen, Jie Zhou, Liang He,
- Abstract summary: SE(3)-based generative models have shown great promise in protein geometry modeling and effective structure design.<n>Protein-SE(3), a new benchmark based on a unified training framework, comprises protein scaffolding tasks, integrated generative models, high-level mathematical abstraction, and diverse evaluation metrics.
- Score: 35.87227562237066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: SE(3)-based generative models have shown great promise in protein geometry modeling and effective structure design. However, the field currently lacks a modularized benchmark to enable comprehensive investigation and fair comparison of different methods. In this paper, we propose Protein-SE(3), a new benchmark based on a unified training framework, which comprises protein scaffolding tasks, integrated generative models, high-level mathematical abstraction, and diverse evaluation metrics. Recent advanced generative models designed for protein scaffolding, from multiple perspectives like DDPM (Genie1 and Genie2), Score Matching (FrameDiff and RfDiffusion) and Flow Matching (FoldFlow and FrameFlow) are integrated into our framework. All integrated methods are fairly investigated with the same training dataset and evaluation metrics. Furthermore, we provide a high-level abstraction of the mathematical foundations behind the generative models, enabling fast prototyping of future algorithms without reliance on explicit protein structures. Accordingly, we release the first comprehensive benchmark built upon unified training framework for SE(3)-based protein structure design, which is publicly accessible at https://github.com/BruthYU/protein-se3.
Related papers
- Protein Structure Tokenization: Benchmarking and New Recipe [16.842453216446987]
StructTokenBench is a framework that comprehensively evaluates the quality and efficiency of structure tokenizers.<n>AminoAseed is a strategy that enhances codebook updates and optimally balances codebook size and dimension for improved tokenizer utilization and quality.<n>Our method achieves an average of 6.31% performance improvement across 24 supervised tasks, with sensitivity and utilization rates increased by 12.83% and 124.03%, respectively.
arXiv Detail & Related papers (2025-02-28T15:14:33Z) - ProteinWeaver: A Divide-and-Assembly Approach for Protein Backbone Design [61.19456204667385]
We introduce ProteinWeaver, a two-stage framework for protein backbone design.<n>ProteinWeaver generates high-quality, novel protein backbones through versatile domain assembly.<n>By introducing a divide-and-assembly' paradigm, ProteinWeaver advances protein engineering and opens new avenues for functional protein design.
arXiv Detail & Related papers (2024-11-08T08:10:49Z) - ProteinBench: A Holistic Evaluation of Protein Foundation Models [53.59325047872512]
We introduce ProteinBench, a holistic evaluation framework for protein foundation models.
Our approach consists of three key components: (i) A taxonomic classification of tasks that broadly encompass the main challenges in the protein domain, based on the relationships between different protein modalities; (ii) A multi-metric evaluation approach that assesses performance across four key dimensions: quality, novelty, diversity, and robustness; and (iii) In-depth analyses from various user objectives, providing a holistic view of model performance.
arXiv Detail & Related papers (2024-09-10T06:52:33Z) - Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation [55.93511121486321]
We introduce FoldFlow-2, a novel sequence-conditioned flow matching model for protein structure generation.<n>We train FoldFlow-2 at scale on a new dataset that is an order of magnitude larger than PDB datasets of prior works.<n>We empirically observe that FoldFlow-2 outperforms previous state-of-the-art protein structure-based generative models.
arXiv Detail & Related papers (2024-05-30T17:53:50Z) - xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein [74.64101864289572]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously.<n>xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories.<n>It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z) - Progressive Multi-Modality Learning for Inverse Protein Folding [47.095862120116976]
We propose a novel protein design paradigm called MMDesign, which leverages multi-modality transfer learning.
MMDesign is the first framework that combines a pretrained structural module with a pretrained contextual module, using an auto-encoder (AE) based language model to incorporate prior protein semantic knowledge.
Experimental results, only training with the small dataset, demonstrate that MMDesign consistently outperforms baselines on various public benchmarks.
arXiv Detail & Related papers (2023-12-11T10:59:23Z) - Neural Embeddings for Protein Graphs [0.8258451067861933]
We propose a novel framework for embedding protein graphs in geometric vector spaces.
We learn an encoder function that preserves the structural distance between protein graphs.
Our framework achieves remarkable results in the task of protein structure classification.
arXiv Detail & Related papers (2023-06-07T14:50:34Z) - A Systematic Study of Joint Representation Learning on Protein Sequences
and Structures [38.94729758958265]
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions.
Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge.
Our study undertakes a comprehensive exploration of joint protein representation learning by integrating a state-of-the-art PLM with distinct structure encoders.
arXiv Detail & Related papers (2023-03-11T01:24:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.