Fugu-MT 論文翻訳(概要): PepSpecBench: A Unified Evaluation Benchmark for Peptide Tandem Mass Spectrometry Prediction

論文の概要: PepSpecBench: A Unified Evaluation Benchmark for Peptide Tandem Mass Spectrometry Prediction

arxiv url: http://arxiv.org/abs/2605.01945v1
Date: Sun, 03 May 2026 16:11:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:50.008986
Title: PepSpecBench: A Unified Evaluation Benchmark for Peptide Tandem Mass Spectrometry Prediction
Title（参考訳）: PepSpecBench: ペプチドタンデム質量分析法の統一評価ベンチマーク
Authors: Zhiwen Yang, Pan Liu, Yifan Li, Yunhua Zhong, Jun Xia,
Abstract要約: PepSpecBenchはペプチドMS/MSスペクトル予測のための統一ベンチマークである。補完的な公開データセット間でのデータ前処理を標準化する。また、シークエンスリークをなくすための厳格なバックボーン結合分割戦略も実施している。
参考スコア（独自算出の注目度）: 17.33669468355787
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tandem mass spectrometry provides a high-throughput framework for identifying and quantifying proteins in complex biological samples. In computational proteomics, predicting peptide MS/MS spectra is a critical task, enabling downstream applications such as large-scale peptide identification and quantification. While deep learning architectures have substantially improved prediction accuracy, three evaluation challenges obscure the true progress of the field. First, inconsistent data preprocessing and incompatible model output spaces hinder fair model comparison. Second, flawed data splitting strategies can permit hidden sequence leakage and inflate reported performance. Third, existing evaluations typically lack comprehensive cross-species benchmarking and systematic assessment of model robustness to influential experimental conditions. To address these challenges, we propose PepSpecBench, a unified benchmark for peptide MS/MS spectrum prediction. PepSpecBench standardizes data preprocessing across complementary public datasets, enforces a strict backbone-disjoint splitting strategy to eliminate sequence leakage, and evaluates diverse architectures within a shared fragment-ion representation space. It further introduces a comprehensive multi-species evaluation suite and physically grounded metadata perturbation probes to assess model robustness and instrument awareness. We uncover previously unrecognized performance discrepancies and robustness limitations across six representative models, providing actionable insights for future model design, evaluation and practical deployment.
Abstract（参考訳）: タンデム質量分析法は、複雑な生物学的サンプル中のタンパク質を同定し定量化するための高スループットの枠組みを提供する。計算プロテオミクスにおいて、ペプチドMS/MSスペクトルの予測は重要な課題であり、大規模ペプチドの同定や定量化といった下流の応用を可能にする。ディープラーニングアーキテクチャは予測精度を大幅に向上させたが、3つの評価課題は分野の真の進歩を曖昧にしている。まず、一貫性のないデータ前処理と非互換なモデル出力空間は、公正なモデル比較を妨げる。第二に、欠陥のあるデータ分割戦略は、シークエンスリークを許可し、報告されたパフォーマンスを改善できる。第3に、既存の評価は、一般的に、総合的な種間ベンチマークと、影響のある実験条件に対するモデルロバスト性の体系的な評価を欠いている。これらの課題に対処するため,ペプチドMS/MSスペクトル予測のための統一ベンチマークであるPepSpecBenchを提案する。 PepSpecBenchは、補完的な公開データセット間でのデータ前処理を標準化し、シーケンスリークを排除するために厳格なバックボーン結合分割戦略を適用し、共有フラグメントイオン表現空間内の多様なアーキテクチャを評価する。さらに、モデルロバスト性および機器認識を評価するために、総合的な多種評価スイートと、物理的に基底付けられたメタデータ摂動プローブを導入している。 6つの代表的なモデルにまたがって、これまで認識されていなかった性能の相違と堅牢性の限界を明らかにし、将来のモデル設計、評価、実践的なデプロイメントに実用的な洞察を提供する。

論文の概要: PepSpecBench: A Unified Evaluation Benchmark for Peptide Tandem Mass Spectrometry Prediction

関連論文リスト