PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design
- URL: http://arxiv.org/abs/2312.00080v1
- Date: Thu, 30 Nov 2023 02:37:55 GMT
- Title: PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design
- Authors: Chuanrui Wang, Bozitao Zhong, Zuobai Zhang, Narendra Chaudhary,
Sanchit Misra, Jian Tang
- Abstract summary: We introduce two novel metrics: refoldability-based metric and stability-based metric.
ByProt, ProteinMPNN, and ESM-IF perform exceptionally well on our benchmark, while ESM-Design and AF-Design fall short.
Our proposed benchmark paves the way for a fair and comprehensive evaluation of protein design methods.
- Score: 19.324059406159325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structure-based protein design has attracted increasing interest, with
numerous methods being introduced in recent years. However, a universally
accepted method for evaluation has not been established, since the wet-lab
validation can be overly time-consuming for the development of new algorithms,
and the $\textit{in silico}$ validation with recovery and perplexity metrics is
efficient but may not precisely reflect true foldability. To address this gap,
we introduce two novel metrics: refoldability-based metric, which leverages
high-accuracy protein structure prediction models as a proxy for wet lab
experiments, and stability-based metric, which assesses whether models can
assign high likelihoods to experimentally stable proteins. We curate datasets
from high-quality CATH protein data, high-throughput $\textit{de novo}$
designed proteins, and mega-scale experimental mutagenesis experiments, and in
doing so, present the $\textbf{PDB-Struct}$ benchmark that evaluates both
recent and previously uncompared protein design methods. Experimental results
indicate that ByProt, ProteinMPNN, and ESM-IF perform exceptionally well on our
benchmark, while ESM-Design and AF-Design fall short on the refoldability
metric. We also show that while some methods exhibit high sequence recovery,
they do not perform as well on our new benchmark. Our proposed benchmark paves
the way for a fair and comprehensive evaluation of protein design methods in
the future. Code is available at https://github.com/WANG-CR/PDB-Struct.
Related papers
- Hashing for Protein Structure Similarity Search [19.352125515561287]
We propose a novel method for protein structure similarity search (PSSS)
Underlinetextp$runderlinetexto$tein $underlinetexts$tructure $underlinetexth$ashing (POSH) for PSSS.
POSH learns a binary vector representation for each protein structure, which can dramatically reduce the time and memory cost for PSSS.
arXiv Detail & Related papers (2024-11-13T02:02:52Z) - CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation [7.161099050722313]
We develop a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro)
CPE-Pro learns the structural information of proteins and captures inter-structural differences to achieve accurate traceability on four data classes.
We utilize Foldseek to encode protein structures into "structure-sequences" and trained a protein Structural Sequence Language Model, SSLM.
arXiv Detail & Related papers (2024-10-21T02:21:56Z) - ProteinBench: A Holistic Evaluation of Protein Foundation Models [53.59325047872512]
We introduce ProteinBench, a holistic evaluation framework for protein foundation models.
Our approach consists of three key components: (i) A taxonomic classification of tasks that broadly encompass the main challenges in the protein domain, based on the relationships between different protein modalities; (ii) A multi-metric evaluation approach that assesses performance across four key dimensions: quality, novelty, diversity, and robustness; and (iii) In-depth analyses from various user objectives, providing a holistic view of model performance.
arXiv Detail & Related papers (2024-09-10T06:52:33Z) - NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing.
It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics.
Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv Detail & Related papers (2024-06-16T08:23:21Z) - Endowing Protein Language Models with Structural Knowledge [5.587293092389789]
We introduce a novel framework that enhances protein language models by integrating protein structural data.
The refined model, termed Protein Structure Transformer (PST), is further pretrained on a small protein structure database.
PST consistently outperforms the state-of-the-art foundation model for protein sequences, ESM-2, setting a new benchmark in protein function prediction.
arXiv Detail & Related papers (2024-01-26T12:47:54Z) - Protein 3D Graph Structure Learning for Robust Structure-based Protein
Property Prediction [43.46012602267272]
Protein structure-based property prediction has emerged as a promising approach for various biological tasks.
Current practices, which simply employ accurately predicted structures during inference, suffer from notable degradation in prediction accuracy.
Our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures.
arXiv Detail & Related papers (2023-10-14T08:43:42Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - PDBench: Evaluating Computational Methods for Protein Sequence Design [2.0187324832551385]
We present a benchmark set of proteins and propose tests to assess the performance of deep learning based methods.
Our robust benchmark provides biological insight into the behaviour of design methods, which is essential for evaluating their performance and utility.
arXiv Detail & Related papers (2021-09-16T12:20:03Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.