Fugu-MT 論文翻訳(概要): From paper to benchmark: agentic, framework-based reproduction of under-specified methods in machine health intelligence

論文の概要: From paper to benchmark: agentic, framework-based reproduction of under-specified methods in machine health intelligence

arxiv url: http://arxiv.org/abs/2605.28371v1
Date: Wed, 27 May 2026 12:11:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:56.033057
Title: From paper to benchmark: agentic, framework-based reproduction of under-specified methods in machine health intelligence
Title（参考訳）: 論文からベンチマークへ:マシンヘルスインテリジェンスにおける不特定手法のエージェント的,フレームワークに基づく再現
Authors: Raffael Theiler, Ludovico Comito, David Leko, Leandro Von Krannichfeldt, Lev Telyatnikov, Olga Fink,
Abstract要約: 産業統計・健康管理(Industrial Prognostics and Health Management)は、公開論文を実行可能なベンチマーク対応実装に変換するための代表的なケーススタディである。提案手法では,エフェスロット結合インタフェースを用いて,エージェントが文書を共有PHMベンチマークフレームワークに変換する。提案手法は16個のPHM紙上で評価され, フレームワークを改良した, スキルベース, プロンプトベースのエージェント再現と, 最新のフレームワークフリーのペーパー複製エージェントを比較した。
参考スコア（独自算出の注目度）: 12.026104106923556
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Industrial Prognostics and Health Management (PHM) provides a representative case study for a broader challenge in applied machine learning: translating published papers into executable, benchmark-ready implementations. Reproducing under-specified methods in PHM is particularly difficult due to restricted access to industrial datasets, incomplete reporting of preprocessing and evaluation protocols, and implicit design choices (e.g., windowing, target construction, data splits) that critically affect performance. Existing paper-to-code systems generate implementations for individual papers, but these artifacts are often not directly comparable due to inconsistencies in assumptions and evaluation settings. We introduce \emph{agentic, framework-based PHM paper reproduction}, where an agent translates a paper into a shared PHM benchmark framework via a \emph{slot-binding interface}. This interface maps equations and protocol descriptions into structured components (task definitions, dataset adapters, windowing, targets, models, and evaluators), while explicitly recording unresolved assumptions. The resulting implementations are validated against standardized task contracts and evaluation hooks, enabling consistent and comparable benchmarking. We evaluate this approach on 16 PHM papers, comparing framework-enhanced, skill-based and prompt-based agentic reproduction against a recent framework-free paper-reproduction agent. We assess reproduction success, model-based code evaluation, framework binding of paper assumptions, and cross-paper benchmark comparability under standardized protocols. Our results show that coupling agentic generation with a shared framework transforms paper reproduction from isolated code synthesis into executable, assumption-aware, and systematically comparable benchmark implementations.
Abstract（参考訳）: Industrial Prognostics and Health Management (PHM)は、機械学習の幅広い課題に対する代表的なケーススタディである。 PHMで未指定の手法を再現することは、産業データセットへのアクセス制限、事前処理と評価プロトコルの不完全な報告、および性能に重大な影響を及ぼす暗黙の設計選択(ウィンドウニング、ターゲット構築、データ分割など)により、特に困難である。既存の文書コードシステムは個々の論文の実装を生成するが、仮定や評価設定の不整合のため、これらのアーティファクトは直接的に比較されないことが多い。本稿では, エージェントが共有PHMベンチマークフレームワークに, \emph{agentic, framework-based PHM paper production}を導入し, そこでは, エージェントが共有PHMベンチマークフレームワークに, \emph{slot-binding interface}を介して翻訳する。このインターフェースは、方程式とプロトコル記述を構造化されたコンポーネント(タスク定義、データセットアダプタ、ウィンドウニング、ターゲット、モデル、評価器)にマッピングし、未解決の仮定を明示的に記録する。結果として得られた実装は、標準化されたタスクコントラクトと評価フックに対して検証され、一貫性と同等のベンチマークを可能にする。提案手法は16個のPHM紙上で評価され, フレームワークを改良した, スキルベース, プロンプトベースのエージェント再現と, 最新のフレームワークフリーのペーパー複製エージェントを比較した。我々は、標準化されたプロトコルの下で、複製成功、モデルに基づくコード評価、紙仮定のフレームワークバインディング、およびクロスペーパーベンチマークの互換性を評価する。この結果から,共有フレームワークとの結合型エージェント生成により,独立したコード合成から,実行可能,仮定対応,体系的に比較可能なベンチマーク実装へと紙の複製が変換されることが示唆された。

論文の概要: From paper to benchmark: agentic, framework-based reproduction of under-specified methods in machine health intelligence

関連論文リスト