Fugu-MT 論文翻訳(概要): ViroBench: Benchmarking Nucleotide Foundation Models on Viral Genomics Tasks

論文の概要: ViroBench: Benchmarking Nucleotide Foundation Models on Viral Genomics Tasks

arxiv url: http://arxiv.org/abs/2605.25388v1
Date: Mon, 25 May 2026 03:31:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:19.27126
Title: ViroBench: Benchmarking Nucleotide Foundation Models on Viral Genomics Tasks
Title（参考訳）: ViroBench: ウイルスゲノムタスクにおけるヌクレオチド基盤モデルのベンチマーク
Authors: Dongxin Ye, Fang Hu, Han Hu, Shu Hu, Yang Tan, Wanli Ouyang, Stan Z. Li, Jie Cui, Nanqing Dong,
Abstract要約: 我々は、ヌクレオチド基礎モデル(NFM)に特化して設計された最初の包括的かつ大規模ベンチマークであるViroBenchを紹介する。 ViroBench氏は、生物学的理解と潜伏するバイオセキュリティリスクという、2つの重要な側面にわたるモデルを評価し、4つのタスクタイプ内の18のさまざまなシナリオをカバーしている。 ViroBenchは、ウイルスヌクレオチド基盤モデルの研究のための解釈可能、診断的評価および再現可能な測定フレームワークを提供する。
参考スコア（独自算出の注目度）: 86.89727311669937
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Nucleotide sequences constitute the fundamental genetic basis of biological systems, rendering viral genomic analysis critical for biomedical advancement. Despite progress in biological foundation models, specifically nucleotide foundation models (NFMs), the field lacks a unified standard for viral genomics to facilitate community development and enforce biosecurity constraints. To address this, we introduce ViroBench, the first comprehensive and large-scale benchmark specifically designed for NFMs in viral settings. ViroBench evaluates models across two critical dimensions: biological understanding and latent biosecurity risk, covering 18 diverse scenarios within 4 task types. Extensive evaluation of 66 NFMs across diverse architectures yields three critical conclusions. Firstly, NFMs exhibit a performance degradation in biological understanding under phylogenetic and temporal shifts, indicating weak extrapolation capabilities. Secondly, generation tasks reveal a decoupling between statistical likelihood and biological functional validity, posing latent biosecurity risks. Thirdly, controlled ablation studies reveal that taxonomic diversity in pretraining data outweighs parameter scale. Specifically, a lightweight baseline trained on diverse data achieves a 67.5% performance gain over its original model. Overall, ViroBench provides interpretable, diagnostic evaluations and a reproducible measurement framework for future research on viral nucleotide foundation models. The datasets and code are publicly available at https://github.com/QIANJINYDX/ViroBench.
Abstract（参考訳）: ヌクレオチド配列は生物学的システムの基本的な遺伝的基盤となり、ウイルスゲノム解析を生物医学の進歩に欠かせないものにする。生物基盤モデル、特にヌクレオチド基礎モデル(NFM)の進歩にもかかわらず、この分野は、コミュニティの発展を促進し、バイオセキュリティの制約を強制するために、ウイルスゲノム学の統一された標準を欠いている。これを解決するために、バイラルな設定でNFM用に特別に設計された最初の包括的で大規模なベンチマークであるViroBenchを紹介します。 ViroBench氏は、生物学的理解と潜伏するバイオセキュリティリスクという、2つの重要な側面にわたるモデルを評価し、4つのタスクタイプ内の18のさまざまなシナリオをカバーしている。 66個のNFMを多種多様なアーキテクチャで広範囲に評価した結果、3つの重要な結論が得られた。第一に、NFMは系統的および時間的シフトの下で生物学的理解のパフォーマンス低下を示し、弱い外挿能力を示す。第二に、生成タスクは統計的可能性と生物学的機能的妥当性の疎結合を示し、潜伏するバイオセキュリティリスクを生じさせる。第三に、制御されたアブレーション研究は、事前学習データの分類学的多様性がパラメータスケールを上回ることを示した。具体的には、多様なデータに基づいてトレーニングされた軽量のベースラインは、オリジナルのモデルよりも67.5%のパフォーマンス向上を実現している。全体として、ViroBenchは、ウイルスヌクレオチド基盤モデルに関する将来の研究のために、解釈可能、診断的評価および再現可能な測定フレームワークを提供する。データセットとコードはhttps://github.com/QIANJINYDX/ViroBench.comで公開されている。

論文の概要: ViroBench: Benchmarking Nucleotide Foundation Models on Viral Genomics Tasks

関連論文リスト