Fugu-MT 論文翻訳(概要): Learning to Generate Formally Verifiable Step-by-Step Logic Reasoning via Structured Formal Intermediaries

論文の概要: Learning to Generate Formally Verifiable Step-by-Step Logic Reasoning via Structured Formal Intermediaries

arxiv url: http://arxiv.org/abs/2603.29500v1
Date: Tue, 31 Mar 2026 09:42:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-01 15:25:03.466422
Title: Learning to Generate Formally Verifiable Step-by-Step Logic Reasoning via Structured Formal Intermediaries
Title（参考訳）: 形式的に検証可能なステップ・バイ・ステップ論理推論を構造化形式媒介を介して生成する学習
Authors: Luoxin Chen, Yichi Zhou, Huishuai Zhang,
Abstract要約: PRoSFI(Process Reward over Structured Formal Intermediates)は、精度を損なうことなく推論信頼性を高める新しい報奨法である。完全に検証された推論チェーンのみが高い報酬を受け取る。形式的検証の統合は、ステップバイステップのマシンチェック可能な証明を生成するためのモデルを導く。
参考スコア（独自算出の注目度）: 18.744562922743405
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have recently demonstrated impressive performance on complex, multi-step reasoning tasks, especially when post-trained with outcome-rewarded reinforcement learning Guo et al. 2025. However, it has been observed that outcome rewards often overlook flawed intermediate steps, leading to unreliable reasoning steps even when final answers are correct. To address this unreliable reasoning, we propose PRoSFI (Process Reward over Structured Formal Intermediates), a novel reward method that enhances reasoning reliability without compromising accuracy. Instead of generating formal proofs directly, which is rarely accomplishable for a modest-sized (7B) model, the model outputs structured intermediate steps aligned with its natural language reasoning. Each step is then verified by a formal prover. Only fully validated reasoning chains receive high rewards. The integration of formal verification guides the model towards generating step-by-step machine-checkable proofs, thereby yielding more credible final answers. PRoSFI offers a simple and effective approach to training trustworthy reasoning models.
Abstract（参考訳）: 大規模言語モデル(LLM)は最近、複雑な多段階推論タスクにおいて印象的なパフォーマンスを実証している。しかし、結果の報奨は、しばしば欠陥のある中間ステップを見落とし、最終回答が正しければ、信頼できない推論ステップにつながることが観察されている。この不確実な推論に対処するため,我々はPRoSFI(Process Reward over Structured Formal Intermediates)を提案する。形式的証明を直接生成するのではなく、モデストサイズの (7B) モデルでは達成できないが、モデルは自然言語の推論と整合した構造化中間ステップを出力する。それぞれのステップは、正式な証明者によって検証される。完全に検証された推論チェーンのみが高い報酬を受け取る。形式的検証の統合は、モデルをステップバイステップのマシンチェック可能な証明を生成するためのガイドとなり、それによってより信頼性の高い最終回答が得られる。 PRoSFIは、信頼できる推論モデルをトレーニングするためのシンプルで効果的なアプローチを提供する。

論文の概要: Learning to Generate Formally Verifiable Step-by-Step Logic Reasoning via Structured Formal Intermediaries

関連論文リスト