Fugu-MT 論文翻訳(概要): A Single Revision Step Improves Token-Efficient LLM Reasoning

論文の概要: A Single Revision Step Improves Token-Efficient LLM Reasoning

arxiv url: http://arxiv.org/abs/2602.02828v1
Date: Mon, 02 Feb 2026 21:28:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-04 18:37:15.096692
Title: A Single Revision Step Improves Token-Efficient LLM Reasoning
Title（参考訳）: トーケン効率のLLM推論を改良した単一修正ステップ
Authors: Yingchuan Zhang, Terry Ma, Wenxuan Zhong, Ping Ma,
Abstract要約: 大規模言語モデルのためのトレーニングフリーで推論のみのフレームワークであるPacket-Conditioned Revision (PACER)を紹介した。 PACERは、推論トレースを使用して、構造化された調整ステップを通じて結論を修正できる。競争力のある数学のベンチマークでは、PACERは256サンプルの多数決の正確さと一致または超える。
参考スコア（独自算出の注目度）: 3.344806691289323
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) achieve higher accuracy on challenging reasoning tasks by scaling test-time compute through multiple trajectory sampling. However, standard aggregation methods like majority voting or individual confidence-based filtering face a fundamental "blind spot": they evaluate each trace in isolation. As problems scale in difficulty, models often generate hallucinated paths that exhibit misleadingly high confidence, causing the true solution to be suppressed by a narrow margin in traditional voting. We ask: can we enable traces to "peer-review" each other to resolve these near-miss errors? We introduce Packet-Conditioned Revision (PACER), a training-free, inference-only framework that enables reasoning traces to revise their conclusions through a structured coordination step. After a preliminary screening of generated traces, PACER constructs a compact consensus packet containing (i) unique candidate answers, (ii) their aggregated confidence scores, and (iii) representative reasoning summaries for each candidate answer. Individual traces then perform a targeted self-review conditioned on this packet, allowing them to identify specific logical junctions where they diverged from the broader consensus and pivot if their original reasoning is found to be flawed. Final predictions are obtained via confidence-weighted voting over these revised trajectories. On challenging competitive math benchmarks such as AIME and BRUMO, PACER matches or exceeds the accuracy of 256-sample majority voting, significantly outperforming raw ensemble baselines by transforming simple consensus into a collaborative logical refinement process.
Abstract（参考訳）: 大規模言語モデル(LLM)は、複数の軌道サンプリングを通してテスト時間計算をスケールすることで、困難な推論タスクにおいて高い精度を達成する。しかし、多数決や個人信頼に基づくフィルタリングのような標準的な集計手法は、それぞれのトレースを独立して評価する、基本的な「盲点」に直面している。難易度が大きくなるにつれて、モデルはしばしば、誤解を招くほど高い信頼を示す幻覚パスを生成し、伝統的な投票において、真の解決策は狭いマージンによって抑制される。トレースを相互に"ピア・リビュー"して、これらのニアミスエラーを解決できますか? Packet-Conditioned Revision (PACER) は、学習不要で推論のみのフレームワークで、推論トレースが構造化された調整ステップを通じて結論を修正できるようにする。生成されたトレースの予備スクリーニングの後、PACERはコンパクトコンセンサスパケットを構成する。 (i)独特な候補回答。 (二)集計された信頼点、及び三各候補者の回答の要約を代表する。個々のトレースは、このパケット上でターゲットの自己レビュー条件を実行し、より広範なコンセンサスから分岐した特定の論理的ジャンクションを特定し、元の推論に欠陥があるかどうかを判断する。最終的な予測は、これらの修正軌跡に対する信頼度重み付け投票によって得られる。 AIMEやBRUMOのような競争力のある数学ベンチマークでは、PACERは256サンプルの多数決の正確さと一致し、単純なコンセンサスを協調的な論理的洗練プロセスに変換することにより、生のアンサンブルベースラインを大幅に上回っている。

論文の概要: A Single Revision Step Improves Token-Efficient LLM Reasoning

関連論文リスト