Fugu-MT 論文翻訳(概要): PaperRepro: Automated Computational Reproducibility Assessment for Social Science Papers

論文の概要: PaperRepro: Automated Computational Reproducibility Assessment for Social Science Papers

arxiv url: http://arxiv.org/abs/2603.00058v1
Date: Tue, 10 Feb 2026 09:04:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 01:20:08.010887
Title: PaperRepro: Automated Computational Reproducibility Assessment for Social Science Papers
Title（参考訳）: PaperRepro:社会科学論文における自動計算再現性評価
Authors: Linhao Zhang, Tong Xia, Jinghua Piao, Lizhen Cui, Yong Li,
Abstract要約: PaperReproは、自動評価のための新しい2段階のマルチエージェントアプローチである。実行段階では、エージェントが複製パッケージを実行し、コードを編集して再生結果を明示的な成果物としてキャプチャする。評価段階では、エージェントは明確な証拠を用いてエージェントを評価する。
参考スコア（独自算出の注目度）: 33.12402746591649
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Computational reproducibility is essential for the credibility of scientific findings, particularly in the social sciences, where findings often inform real-world decisions. Manual reproducibility assessment is costly and time-consuming, as it is nontrivial to reproduce the reported findings using the authors' released code and data. Recent advances in large models (LMs) have inspired agent-based approaches for automated reproducibility assessment. However, existing approaches often struggle due to limited context capacity, inadequate task-specific tooling, and insufficient result capture. To address these, we propose PaperRepro, a novel two-stage, multi-agent approach that separates execution from evaluation. In the execution stage, agents execute the reproduction package and edit the code to capture reproduced results as explicit artifacts. In the evaluation stage, agents evaluate reproducibility using explicit evidence. PaperRepro assigns distinct responsibilities to agents and equips them with task-specific tools and expert prompts, mitigating context and tooling limitations. It further maximizes the LM's coding capability to enable more complete result capture for evaluation. On REPRO-Bench, a social science reproducibility assessment benchmark, PaperRepro achieves the best overall performance, with a 21.9% relative improvement in score-agreement accuracy over the strongest prior baseline. We further refine the benchmark and introduce REPRO-Bench-S, a benchmark stratified by execution difficulty for more diagnostic evaluation of automated reproducibility assessment systems. Our code and data are publicly available
Abstract（参考訳）: 計算的再現性は科学的な発見の信頼性に不可欠であり、特に社会科学では、発見が現実世界の意思決定にしばしば影響を及ぼす。手動再現性の評価は、著者のコードとデータを使って報告された結果を再現するのは簡単ではないため、コストと時間を要する。大規模モデル(LM)の最近の進歩は、自動再現性評価のためのエージェントベースのアプローチにインスピレーションを与えている。しかし、既存のアプローチは、コンテキストキャパシティの制限、タスク固有のツールの不十分、結果のキャプチャが不十分なため、しばしば苦労する。そこで我々はPaperReproを提案する。PaperReproは2段階のマルチエージェントアプローチで、実行と評価を分離する。実行段階では、エージェントが複製パッケージを実行し、コードを編集して再生結果を明示的な成果物としてキャプチャする。評価段階において、エージェントは明確な証拠を用いて再現性を評価する。 PaperReproはエージェントに異なる責任を割り当て、タスク固有のツールや専門家のプロンプトを割り当て、コンテキストやツールの制限を緩和する。さらに、LMのコーディング能力を最大化し、評価のためにより完全な結果キャプチャを可能にする。社会科学再現性評価のベンチマークであるREPRO-Benchでは、PaperReproは最強のベースラインよりも21.9%のスコア獲得精度を向上し、全体的なパフォーマンスを最高のものにしている。我々はさらにベンチマークを改良し、自動再現性評価システムのより詳細な診断評価を行うための実行困難により階層化されたベンチマークであるREPRO-Bench-Sを導入する。私たちのコードとデータは公開されています

論文の概要: PaperRepro: Automated Computational Reproducibility Assessment for Social Science Papers

関連論文リスト