Fugu-MT 論文翻訳(概要): Supporting Workflow Reproducibility by Linking Bioinformatics Tools across Papers and Executable Code

論文の概要: Supporting Workflow Reproducibility by Linking Bioinformatics Tools across Papers and Executable Code

arxiv url: http://arxiv.org/abs/2603.08195v1
Date: Mon, 09 Mar 2026 10:24:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:15.790251
Title: Supporting Workflow Reproducibility by Linking Bioinformatics Tools across Papers and Executable Code
Title（参考訳）: バイオインフォマティクスツールを紙と実行可能なコードにリンクすることでワークフローの再現性を支援する
Authors: Clémence Sebe, Olivier Ferret, Aurélie Névéol, Mahdi Esmailoghli, Ulf Leser, Sarah Cohen-Boulakia,
Abstract要約: 我々は3つのコンポーネントを統合する自動アプローチであるCoPaLinkを紹介した。科学テキストでツール参照を識別するための名前付きエンティティ認識(NER)、ワークフローコードでツール参照を識別するためのNER、バイオインフォマティクスの知識ベースに基づくエンティティリンクである。バイオコンダとバイオウェブの知識ベースを用いたNextflowの評価において,高いF1尺度(84～89)とジョイント精度(66)を達成できる3つのステップすべてに対するアプローチを提案する。
参考スコア（独自算出の注目度）: 5.57580328336509
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Motivation: The rapid growth of biological data has intensified the need for transparent, reproducible, and well-documented computational workflows. The ability to clearly connect the steps of a workflow in the code with their description in a paper would improve workflow understanding, support reproducibility, and facilitate reuse. This task requires the linking of Bioinformatics tools in workflow code with their mentions in a published workflow description. Results: We present CoPaLink, an automated approach that integrates three components: Named Entity Recognition (NER) for identifying tool mentions in scientific text, NER for tool mentions in workflow code, and entity linking grounded on Bioinformatics knowledge bases. We propose approaches for all three steps achieving a high individual F1-measure (84 - 89) and a joint accuracy of 66 when evaluated on Nextflow workflows using Bioconda and Bioweb Knowledge bases. CoPaLink leverages corpora of scientific articles and workflow executable code with curated tool annotations to bridge the gap between narrative descriptions and workflow implementations. Availability: The code is available at https://gitlab.liris.cnrs.fr/sharefair/copalink-experiments and https://gitlab.liris.cnrs.fr/sharefair/copalink. The corpora are also available at https://doi.org/10.5281/zenodo.18526700, https://doi.org/10.5281/zenodo.18526760 and https://doi.org/10.5281/zenodo.18543814.
Abstract（参考訳）: モチベーション(Motivation): 生物学的データの急速な成長により、透過的で再現性があり、文書化された計算ワークフローの必要性が高まっている。コード内のワークフローのステップを論文で記述したものと明確に結びつける能力は、ワークフローの理解を改善し、再現性をサポートし、再利用を容易にする。このタスクでは、ワークフローコードにバイオインフォマティクスツールをリンクする必要がある。結果: CoPaLinkは3つのコンポーネントを統合した自動化されたアプローチである。科学テキストでツール参照を識別するための名前付きエンティティ認識(NER)、ワークフローコードでツール参照を識別するためのNER、バイオインフォマティクスの知識ベースに基づくエンティティリンクである。バイオコンダとバイオウェブの知識ベースを用いたNextflowワークフローの評価において,高い個別F1尺度(84～89)と共同精度(66)を達成できる3つのステップすべてに対するアプローチを提案する。 CoPaLinkは、解説記述とワークフロー実装のギャップを埋めるために、キュレートされたツールアノテーションを備えた、科学記事とワークフロー実行コードのコーパスを活用する。可用性: コードはhttps://gitlab.liris.cnrs.fr/sharefair/copalink-experimentsとhttps://gitlab.liris.cnrs.fr/sharefair/copalinkで利用できる。コーポラはhttps://doi.org/10.5281/zenodo.18526700, https://doi.org/10.5281/zenodo.18526760, https://doi.org/10.5281/zenodo.18543814で入手できる。

論文の概要: Supporting Workflow Reproducibility by Linking Bioinformatics Tools across Papers and Executable Code

関連論文リスト