Fugu-MT 論文翻訳(概要): Training Language Models to Use Prolog as a Tool

論文の概要: Training Language Models to Use Prolog as a Tool

arxiv url: http://arxiv.org/abs/2512.07407v1
Date: Mon, 08 Dec 2025 10:39:38 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-09 22:03:54.844014
Title: Training Language Models to Use Prolog as a Tool
Title（参考訳）: Prologをツールとして使うための言語モデルトレーニング
Authors: Niklas Mellgren, Peter Schneider-Kamp, Lukas Galke Poech,
Abstract要約: 検証可能な計算のための外部ツールとしてPrologを使用するための言語モデルを微調整する。この結果から,形式的検証システムにおける基礎モデル推論は,安全クリティカルなアプリケーションに対する信頼性と監査性を大幅に向上させることがわかった。
参考スコア（独自算出の注目度）: 2.4305775926851334
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ensuring reliable tool use is critical for safe agentic AI systems. Language models frequently produce unreliable reasoning with plausible but incorrect solutions that are difficult to verify. To address this, we investigate fine-tuning models to use Prolog as an external tool for verifiable computation. Using Group Relative Policy Optimization (GRPO), we fine-tune Qwen2.5-3B-Instruct on a cleaned GSM8K-Prolog-Prover dataset while varying (i) prompt structure, (ii) reward composition (execution, syntax, semantics, structure), and (iii) inference protocol: single-shot, best-of-N, and two agentic modes where Prolog is invoked internally or independently. Our reinforcement learning approach outperforms supervised fine-tuning, with our 3B model achieving zero-shot MMLU performance comparable to 7B few-shot results. Our findings reveal that: 1) joint tuning of prompt, reward, and inference shapes program syntax and logic; 2) best-of-N with external Prolog verification maximizes accuracy on GSM8K; 3) agentic inference with internal repair yields superior zero-shot generalization on MMLU-Stem and MMLU-Pro. These results demonstrate that grounding model reasoning in formal verification systems substantially improves reliability and auditability for safety-critical applications. The source code for reproducing our experiments is available under https://github.com/niklasmellgren/grpo-prolog-inference
Abstract（参考訳）: 安全なエージェントAIシステムには、信頼性の高いツールの使用を保証することが重要です。言語モデルは信頼できない推論を、検証が難しいが間違ったソリューションで生成することが多い。そこで本研究では,Prologを外部ツールとして用いた微調整モデルを検証した。 Group Relative Policy Optimization (GRPO) を用いて、クリーン化GSM8K-Prolog-ProverデータセットにQwen2.5-3Bを微調整する。即時構造; 即時構造; 即時構造 (二報酬構成(執行、構文、意味論、構造)及び (iii)推論プロトコル:シングルショット、ベストオブN、2つのエージェントモードで、Prologは内部または独立して呼び出される。我々の強化学習手法は教師付き微調整よりも優れており、3Bモデルでは0ショットMMLUの性能を7ショットに匹敵する精度で達成している。私たちの発見は、こう示しています。 1) プログラムの構文及び論理のプロンプト、報酬及び推論形状の合同チューニング 2)外部Prolog検証によるNのベストプラクティスは,GSM8Kの精度を最大化する。 3) 内部補修によるエージェント推論はMMLU-StemとMMLU-Proに優れたゼロショット一般化をもたらす。これらの結果から,形式的検証システムにおける基礎モデル推論は,安全クリティカルなアプリケーションに対する信頼性と監査性を大幅に向上させることが示された。実験を再現するソースコードはhttps://github.com/niklasmellgren/grpo-prolog-inferenceで入手できる。

論文の概要: Training Language Models to Use Prolog as a Tool

関連論文リスト