Fugu-MT 論文翻訳(概要): ARIADNE: Agentic Reward-Informed Adaptive Decision Exploration via Blackboard-Driven MCTS for Competitive Program Generation

論文の概要: ARIADNE: Agentic Reward-Informed Adaptive Decision Exploration via Blackboard-Driven MCTS for Competitive Program Generation

arxiv url: http://arxiv.org/abs/2605.02431v1
Date: Mon, 04 May 2026 10:30:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:50.236609
Title: ARIADNE: Agentic Reward-Informed Adaptive Decision Exploration via Blackboard-Driven MCTS for Competitive Program Generation
Title（参考訳）: ARIADNE: 競合プログラム生成のためのブラックボード駆動MCTSによるエージェントリワードインフォームド適応決定探索
Authors: Minnan Wei, Xiang Chen, Xiaoshuai Niu, Siyu Chen,
Abstract要約: プログラム生成を逐次決定プロセスとしてモデル化するブラックボード駆動のモンテカルロ木探索(MCTS)フレームワークを提案する。ツールが生成ワークフローを5つの調整段階(戦略選択、コード生成、テスト生成、品質評価、コード修復)に編成し、共有ブラックボードを維持している。 4つのベンチマーク(APPS、CodeContests、CodeContests+、LiveCodeBench)の実験は、ツールが常に最高のPass@1パフォーマンスを達成することを示している。
参考スコア（独自算出の注目度）: 10.232812063343511
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Competitive program generation aims to automatically produce correct and efficient solutions for programming-contest problems under strict time and memory constraints. Existing LLM-based approaches often fail to perform explicit algorithmic planning and to handle edge cases robustly, leading to unreliable one-shot generation. Moreover, although execution feedback is essential for iterative debugging and refinement, incorporating such feedback effectively within limited computational budgets remains difficult. To overcome these limitations, we propose {\tool}, a blackboard-driven Monte Carlo Tree Search (MCTS) framework that models program generation as a sequential decision process. {\tool} organizes the generation workflow into five coordinated stages (i.e., strategy selection, code generation, test generation, quality evaluation, and code repair) while maintaining a shared blackboard that accumulates structured evidence to guide subsequent decisions. Experiments on four benchmarks (APPS, CodeContests, CodeContests+, and LiveCodeBench) show that {\tool} consistently achieves the best Pass@1 performance across multiple LLM backends. With GPT-4o, {\tool} attains Pass@1 scores of 41.30, 46.67, 27.27, and 20.91, surpassing the strongest baseline CodeSim by up to 26.06 points, while further improvements are observed with DeepSeek-V3.2. These results indicate that combining global search through MCTS with persistent evidence accumulation on a shared blackboard enables systematic exploration and effective feedback utilization, substantially enhancing the capability of LLMs in competitive program generation.
Abstract（参考訳）: 競合プログラム生成は、厳密な時間とメモリ制約の下で、プログラム競合問題に対する正確かつ効率的なソリューションを自動生成することを目的としている。既存のLLMベースのアプローチは、明示的なアルゴリズム計画の実行に失敗し、エッジケースを堅牢に処理することが多く、信頼性の低いワンショット生成に繋がる。さらに, 繰り返しデバッグや改良には実行フィードバックが不可欠であるが, 限られた計算予算にそのようなフィードバックを効果的に組み込むことは依然として困難である。これらの制限を克服するために,プログラム生成を逐次決定プロセスとしてモデル化するブラックボード駆動のモンテカルロ木探索(MCTS)フレームワークである {\tool} を提案する。生成ワークフローを5つの調整段階(戦略の選択、コード生成、テスト生成、品質評価、コード修復)にまとめながら、構造化された証拠を蓄積してその後の決定を導く共有ブラックボードを維持する。 4つのベンチマーク(APPS、CodeContests、CodeContests+、LiveCodeBench)の実験では、ttool}が複数のLLMバックエンドにまたがる最高のPass@1パフォーマンスを一貫して達成している。 GPT-4oでは、パス@1スコアは41.30、46.67、27.27、20.91となり、最強のベースラインであるCodeSimを26.06ポイントまで上回り、DeepSeek-V3.2ではさらなる改善が見られた。これらの結果から,MCTSによるグローバル検索と共有黒板上に蓄積した持続的証拠を組み合わせることで,系統的な探索と効果的なフィードバック利用が可能となり,プログラム生成におけるLCMの能力を大幅に向上することが示唆された。

論文の概要: ARIADNE: Agentic Reward-Informed Adaptive Decision Exploration via Blackboard-Driven MCTS for Competitive Program Generation

関連論文リスト