MotifBench: A standardized protein design benchmark for motif-scaffolding problems
- URL: http://arxiv.org/abs/2502.12479v2
- Date: Wed, 19 Feb 2025 17:51:50 GMT
- Title: MotifBench: A standardized protein design benchmark for motif-scaffolding problems
- Authors: Zhuoqi Zheng, Bo Zhang, Kieran Didi, Kevin K. Yang, Jason Yim, Joseph L. Watson, Hai-Feng Chen, Brian L. Trippe,
- Abstract summary: The motif-scaffolding problem is a central task in computational protein design.
MotifBench is a collection of 30 benchmark problems and an implementation of this benchmark and leaderboard at evaluation.com/blt2114/MotifBench.
- Score: 24.496222822010346
- License:
- Abstract: The motif-scaffolding problem is a central task in computational protein design: Given the coordinates of atoms in a geometry chosen to confer a desired biochemical function (a motif), the task is to identify diverse protein structures (scaffolds) that include the motif and maintain its geometry. Significant recent progress on motif-scaffolding has been made due to computational evaluation with reliable protein structure prediction and fixed-backbone sequence design methods. However, significant variability in evaluation strategies across publications has hindered comparability of results, challenged reproducibility, and impeded robust progress. In response we introduce MotifBench, comprising (1) a precisely specified pipeline and evaluation metrics, (2) a collection of 30 benchmark problems, and (3) an implementation of this benchmark and leaderboard at github.com/blt2114/MotifBench. The MotifBench test cases are more difficult compared to earlier benchmarks, and include protein design problems for which solutions are known but on which, to the best of our knowledge, state-of-the-art methods fail to identify any solution.
Related papers
- Multi-Scale Representation Learning for Protein Fitness Prediction [31.735234482320283]
Previous methods have primarily relied on self-supervised models trained on vast, unlabeled protein sequence or structure datasets.
We introduce the Sequence-Structure-Surface Fitness (S3F) model - a novel multimodal representation learning framework that integrates protein features across several scales.
Our approach combines sequence representations from a protein language model with Geometric Vector Perceptron networks encoding protein backbone and detailed surface topology.
arXiv Detail & Related papers (2024-12-02T04:28:10Z) - Decomposable Transformer Point Processes [2.1756081703276]
We propose a framework where the advantages of the attention-based architecture are maintained and the limitation of the thinning algorithm is circumvented.
The proposed method attains state-of-the-art performance in predicting the next event of a sequence given its history.
arXiv Detail & Related papers (2024-09-26T13:22:58Z) - ProteinBench: A Holistic Evaluation of Protein Foundation Models [53.59325047872512]
We introduce ProteinBench, a holistic evaluation framework for protein foundation models.
Our approach consists of three key components: (i) A taxonomic classification of tasks that broadly encompass the main challenges in the protein domain, based on the relationships between different protein modalities; (ii) A multi-metric evaluation approach that assesses performance across four key dimensions: quality, novelty, diversity, and robustness; and (iii) In-depth analyses from various user objectives, providing a holistic view of model performance.
arXiv Detail & Related papers (2024-09-10T06:52:33Z) - CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph [66.11279161533619]
CBGBench is a benchmark for structure-based drug design (SBDD)
By categorizing existing methods based on their attributes, CBGBench implements various cutting-edge methods.
We have adapted these models to a range of tasks essential in drug design, which are considered sub-tasks within the graph fill-in-the-blank tasks.
arXiv Detail & Related papers (2024-06-16T08:20:24Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Solvent: A Framework for Protein Folding [0.39373541926236766]
After AlphaFold2, the protein folding task has entered a new phase, and many methods are proposed based on the component of AlphaFold2.
The importance of a unified research framework in protein folding contains implementations and benchmarks to consistently and fairly compare various approaches.
We present solvent, a protein folding framework that supports significant components of state-of-the-art models in the manner of an off-the-shelf interface.
arXiv Detail & Related papers (2023-07-07T09:01:42Z) - Geometric Deep Learning for Structure-Based Drug Design: A Survey [83.87489798671155]
Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates.
Recent advancements in geometric deep learning, which effectively integrate and process 3D geometric data, have significantly propelled the field forward.
arXiv Detail & Related papers (2023-06-20T14:21:58Z) - Design-Bench: Benchmarks for Data-Driven Offline Model-Based
Optimization [82.02008764719896]
Black-box model-based optimization problems are ubiquitous in a wide range of domains, such as the design of proteins, DNA sequences, aircraft, and robots.
We present Design-Bench, a benchmark for offline MBO with a unified evaluation protocol and reference implementations of recent methods.
Our benchmark includes a suite of diverse and realistic tasks derived from real-world optimization problems in biology, materials science, and robotics.
arXiv Detail & Related papers (2022-02-17T05:33:27Z) - PDBench: Evaluating Computational Methods for Protein Sequence Design [2.0187324832551385]
We present a benchmark set of proteins and propose tests to assess the performance of deep learning based methods.
Our robust benchmark provides biological insight into the behaviour of design methods, which is essential for evaluating their performance and utility.
arXiv Detail & Related papers (2021-09-16T12:20:03Z) - Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model
for Protein Design [70.27706384570723]
We propose Fold2Seq, a novel framework for designing protein sequences conditioned on a specific target fold.
We show improved or comparable performance of Fold2Seq in terms of speed, coverage, and reliability for sequence design.
The unique advantages of fold-based Fold2Seq, in comparison to a structure-based deep model and RosettaDesign, become more evident on three additional real-world challenges.
arXiv Detail & Related papers (2021-06-24T14:34:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.