Related papers: PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design

PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design

URL: http://arxiv.org/abs/2312.00080v1
Date: Thu, 30 Nov 2023 02:37:55 GMT
Title: PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design
Authors: Chuanrui Wang, Bozitao Zhong, Zuobai Zhang, Narendra Chaudhary, Sanchit Misra, Jian Tang
Abstract summary: We introduce two novel metrics: refoldability-based metric and stability-based metric. ByProt, ProteinMPNN, and ESM-IF perform exceptionally well on our benchmark, while ESM-Design and AF-Design fall short. Our proposed benchmark paves the way for a fair and comprehensive evaluation of protein design methods.
Score: 19.324059406159325
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Structure-based protein design has attracted increasing interest, with numerous methods being introduced in recent years. However, a universally accepted method for evaluation has not been established, since the wet-lab validation can be overly time-consuming for the development of new algorithms, and the $\textit{in silico}$ validation with recovery and perplexity metrics is efficient but may not precisely reflect true foldability. To address this gap, we introduce two novel metrics: refoldability-based metric, which leverages high-accuracy protein structure prediction models as a proxy for wet lab experiments, and stability-based metric, which assesses whether models can assign high likelihoods to experimentally stable proteins. We curate datasets from high-quality CATH protein data, high-throughput $\textit{de novo}$ designed proteins, and mega-scale experimental mutagenesis experiments, and in doing so, present the $\textbf{PDB-Struct}$ benchmark that evaluates both recent and previously uncompared protein design methods. Experimental results indicate that ByProt, ProteinMPNN, and ESM-IF perform exceptionally well on our benchmark, while ESM-Design and AF-Design fall short on the refoldability metric. We also show that while some methods exhibit high sequence recovery, they do not perform as well on our new benchmark. Our proposed benchmark paves the way for a fair and comprehensive evaluation of protein design methods in the future. Code is available at https://github.com/WANG-CR/PDB-Struct.

Related papers

Protein-SE(3): Benchmarking SE(3)-based Generative Models for Protein Structure Design [35.87227562237066]
SE(3)-based generative models have shown great promise in protein geometry modeling and effective structure design.<n>Protein-SE(3), a new benchmark based on a unified training framework, comprises protein scaffolding tasks, integrated generative models, high-level mathematical abstraction, and diverse evaluation metrics.
arXiv Detail & Related papers (2025-07-27T11:53:05Z)
DISPROTBENCH: A Disorder-Aware, Task-Rich Benchmark for Evaluating Protein Structure Prediction in Realistic Biological Contexts [76.59606029593085]
DisProtBench is a benchmark for evaluating protein structure prediction models (PSPMs) under structural disorder and complex biological conditions.<n>DisProtBench spans three key axes: data complexity, task diversity, and Interpretability.<n>Results reveal significant variability in model robustness under disorder, with low-confidence regions linked to functional prediction failures.
arXiv Detail & Related papers (2025-06-18T23:58:22Z)
AlphaFold Database Debiasing for Robust Inverse Folding [58.792020809180336]
We introduce a Debiasing Structure AutoEncoder (DeSAE) that learns to reconstruct native-like conformations from intentionally corrupted backbone geometries.<n>At inference, applying DeSAE to AFDB structures produces debiased structures that significantly improve inverse folding performance.
arXiv Detail & Related papers (2025-06-10T02:25:31Z)
Protein Structure Tokenization: Benchmarking and New Recipe [16.842453216446987]
We introduce StructTokenBench, a framework that comprehensively evaluates the quality and efficiency of structure tokenizers. We also develop AminoAseed, a strategy that enhances codebook updates and optimally balances codebook size and dimension for improved tokenizer utilization and quality.
arXiv Detail & Related papers (2025-02-28T15:14:33Z)
Hashing for Protein Structure Similarity Search [19.352125515561287]
We propose a novel method for protein structure similarity search (PSSS) Underlinetextp$runderlinetexto$tein $underlinetexts$tructure $underlinetexth$ashing (POSH) for PSSS. POSH learns a binary vector representation for each protein structure, which can dramatically reduce the time and memory cost for PSSS.
arXiv Detail & Related papers (2024-11-13T02:02:52Z)
CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation [7.161099050722313]
We develop a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro) CPE-Pro learns the structural information of proteins and captures inter-structural differences to achieve accurate traceability on four data classes. We utilize Foldseek to encode protein structures into "structure-sequences" and trained a protein Structural Sequence Language Model, SSLM.
arXiv Detail & Related papers (2024-10-21T02:21:56Z)
ProteinBench: A Holistic Evaluation of Protein Foundation Models [53.59325047872512]
We introduce ProteinBench, a holistic evaluation framework for protein foundation models. Our approach consists of three key components: (i) A taxonomic classification of tasks that broadly encompass the main challenges in the protein domain, based on the relationships between different protein modalities; (ii) A multi-metric evaluation approach that assesses performance across four key dimensions: quality, novelty, diversity, and robustness; and (iii) In-depth analyses from various user objectives, providing a holistic view of model performance.
arXiv Detail & Related papers (2024-09-10T06:52:33Z)
NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing. It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics. Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv Detail & Related papers (2024-06-16T08:23:21Z)
Endowing Protein Language Models with Structural Knowledge [5.587293092389789]
We introduce a novel framework that enhances protein language models by integrating protein structural data. The refined model, termed Protein Structure Transformer (PST), is further pretrained on a small protein structure database. PST consistently outperforms the state-of-the-art foundation model for protein sequences, ESM-2, setting a new benchmark in protein function prediction.
arXiv Detail & Related papers (2024-01-26T12:47:54Z)
Protein 3D Graph Structure Learning for Robust Structure-based Protein Property Prediction [43.46012602267272]
Protein structure-based property prediction has emerged as a promising approach for various biological tasks. Current practices, which simply employ accurately predicted structures during inference, suffer from notable degradation in prediction accuracy. Our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures.
arXiv Detail & Related papers (2023-10-14T08:43:42Z)
Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs) We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness. Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z)
State-specific protein-ligand complex structure prediction with a multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures. Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z)
PDBench: Evaluating Computational Methods for Protein Sequence Design [2.0187324832551385]
We present a benchmark set of proteins and propose tests to assess the performance of deep learning based methods. Our robust benchmark provides biological insight into the behaviour of design methods, which is essential for evaluating their performance and utility.
arXiv Detail & Related papers (2021-09-16T12:20:03Z)
EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network. Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)
Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution. We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.