PDFBench: A Benchmark for De novo Protein Design from Function
- URL: http://arxiv.org/abs/2505.20346v2
- Date: Sun, 28 Sep 2025 03:52:13 GMT
- Title: PDFBench: A Benchmark for De novo Protein Design from Function
- Authors: Jiahao Kuang, Nuowei Liu, Jie Wang, Changzhi Sun, Tao Ji, Yuanbin Wu,
- Abstract summary: PDFBench is the first comprehensive benchmark for function-guided denovo protein design.<n>Our benchmark systematically evaluates eight state-of-the-art models on 16 metrics across two key settings.
- Score: 18.373430158468874
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Function-guided protein design is a crucial task with significant applications in drug discovery and enzyme engineering. However, the field lacks a unified and comprehensive evaluation framework. Current models are assessed using inconsistent and limited subsets of metrics, which prevents fair comparison and a clear understanding of the relationships between different evaluation criteria. To address this gap, we introduce PDFBench, the first comprehensive benchmark for function-guided denovo protein design. Our benchmark systematically evaluates eight state-of-the-art models on 16 metrics across two key settings: description-guided design, for which we repurpose the Mol-Instructions dataset, originally lacking quantitative benchmarking, and keyword-guided design, for which we introduce a new test set, SwissTest, created with a strict datetime cutoff to ensure data integrity. By benchmarking across a wide array of metrics and analyzing their correlations, PDFBench enables more reliable model comparisons and provides key insights to guide future research.
Related papers
- IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation [85.56193980646981]
We propose IF-RewardBench, a comprehensive meta-evaluation benchmark for instruction-following.<n>For each instruction, we construct a preference graph containing all pairwise preferences among multiple responses.<n>Experiments on IF-RewardBench reveal significant deficiencies in current judge models.
arXiv Detail & Related papers (2026-03-05T02:21:17Z) - DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM [35.910677096654574]
Document parsing aims to transform unstructured PDF images into semi-structured data, facilitating the digitization and utilization of information in diverse domains.<n>Common practice often selects the top-performing model on standard benchmarks.<n>We introduce DOCR-Inspector, which formalizes document parsing assessment as fine-grained error detection and analysis.
arXiv Detail & Related papers (2025-12-11T13:16:33Z) - Protein-SE(3): Benchmarking SE(3)-based Generative Models for Protein Structure Design [35.87227562237066]
SE(3)-based generative models have shown great promise in protein geometry modeling and effective structure design.<n>Protein-SE(3), a new benchmark based on a unified training framework, comprises protein scaffolding tasks, integrated generative models, high-level mathematical abstraction, and diverse evaluation metrics.
arXiv Detail & Related papers (2025-07-27T11:53:05Z) - PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs [80.08310253195144]
PRING is the first benchmark that evaluates protein-protein interaction prediction from a graph-level perspective.<n> PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions.
arXiv Detail & Related papers (2025-07-07T15:21:05Z) - DISPROTBENCH: A Disorder-Aware, Task-Rich Benchmark for Evaluating Protein Structure Prediction in Realistic Biological Contexts [76.59606029593085]
DisProtBench is a benchmark for evaluating protein structure prediction models (PSPMs) under structural disorder and complex biological conditions.<n>DisProtBench spans three key axes: data complexity, task diversity, and Interpretability.<n>Results reveal significant variability in model robustness under disorder, with low-confidence regions linked to functional prediction failures.
arXiv Detail & Related papers (2025-06-18T23:58:22Z) - PFMBench: Protein Foundation Model Benchmark [42.418536890859635]
PFMBench is a benchmark evaluating protein foundation models across 38 tasks spanning 8 key areas of protein science.<n>It reveals the inherent correlations between tasks, identifies top-performing models, and provides a streamlined evaluation protocol.
arXiv Detail & Related papers (2025-06-01T07:40:07Z) - Rethinking Text-based Protein Understanding: Retrieval or LLM? [26.278517638774005]
protein-text models have gained significant attention for their potential in protein generation and understanding.<n>Current approaches focus on integrating protein-related knowledge into large language models through continued pretraining and multi-modal alignment.<n>We propose a retrieval-enhanced method, which significantly outperforms fine-tuned LLMs for protein-to-text generation and shows accuracy and efficiency in training-free scenarios.
arXiv Detail & Related papers (2025-05-26T06:25:43Z) - Protein Structure Tokenization: Benchmarking and New Recipe [16.842453216446987]
We introduce StructTokenBench, a framework that comprehensively evaluates the quality and efficiency of structure tokenizers.<n>We also develop AminoAseed, a strategy that enhances codebook updates and optimally balances codebook size and dimension for improved tokenizer utilization and quality.
arXiv Detail & Related papers (2025-02-28T15:14:33Z) - ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities [30.123976500620834]
Traditional fixed test sets fall short in evaluating open-ended capabilities of foundation models.<n>We propose ONEBench, a new testing paradigm that consolidates individual evaluation datasets into a unified, ever-expanding sample pool.<n>By aggregating samples across test sets, ONEBench enables the assessment of diverse capabilities beyond those covered by the original test sets.
arXiv Detail & Related papers (2024-12-09T18:37:14Z) - ProteinBench: A Holistic Evaluation of Protein Foundation Models [53.59325047872512]
We introduce ProteinBench, a holistic evaluation framework for protein foundation models.
Our approach consists of three key components: (i) A taxonomic classification of tasks that broadly encompass the main challenges in the protein domain, based on the relationships between different protein modalities; (ii) A multi-metric evaluation approach that assesses performance across four key dimensions: quality, novelty, diversity, and robustness; and (iii) In-depth analyses from various user objectives, providing a holistic view of model performance.
arXiv Detail & Related papers (2024-09-10T06:52:33Z) - NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing.
It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics.
Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv Detail & Related papers (2024-06-16T08:23:21Z) - ECBD: Evidence-Centered Benchmark Design for NLP [95.50252564938417]
We propose Evidence-Centered Benchmark Design (ECBD), a framework which formalizes the benchmark design process into five modules.
Each module requires benchmark designers to describe, justify, and support benchmark design choices.
Our analysis reveals common trends in benchmark design and documentation that could threaten the validity of benchmarks' measurements.
arXiv Detail & Related papers (2024-06-13T00:59:55Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Implicitly Guided Design with PropEn: Match your Data to Follow the Gradient [52.2669490431145]
PropEn is inspired by'matching', which enables implicit guidance without training a discriminator.
We show that training with a matched dataset approximates the gradient of the property of interest while remaining within the data distribution.
arXiv Detail & Related papers (2024-05-28T11:30:19Z) - Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence [51.54175067684008]
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks.
We first show that feature aggregation and cost aggregation exhibit distinct characteristics and reveal the potential for substantial benefits stemming from the judicious use of both aggregation processes.
Our framework is evaluated on standard benchmarks for semantic matching, and also applied to geometric matching, where we show that our approach achieves significant improvements compared to existing methods.
arXiv Detail & Related papers (2024-03-17T07:02:55Z) - Progressive Multi-Modality Learning for Inverse Protein Folding [47.095862120116976]
We propose a novel protein design paradigm called MMDesign, which leverages multi-modality transfer learning.
MMDesign is the first framework that combines a pretrained structural module with a pretrained contextual module, using an auto-encoder (AE) based language model to incorporate prior protein semantic knowledge.
Experimental results, only training with the small dataset, demonstrate that MMDesign consistently outperforms baselines on various public benchmarks.
arXiv Detail & Related papers (2023-12-11T10:59:23Z) - PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design [19.324059406159325]
We introduce two novel metrics: refoldability-based metric and stability-based metric.
ByProt, ProteinMPNN, and ESM-IF perform exceptionally well on our benchmark, while ESM-Design and AF-Design fall short.
Our proposed benchmark paves the way for a fair and comprehensive evaluation of protein design methods.
arXiv Detail & Related papers (2023-11-30T02:37:55Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.