Fugu-MT 論文翻訳(概要): Can Transformers Learn to Verify During Backtracking Search?

論文の概要: Can Transformers Learn to Verify During Backtracking Search?

arxiv url: http://arxiv.org/abs/2605.22221v1
Date: Thu, 21 May 2026 09:26:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 16:35:42.190115
Title: Can Transformers Learn to Verify During Backtracking Search?
Title（参考訳）: トランスフォーマーは、バックトラック検索中に検証を学べるか?
Authors: Yin Jun Phua, Tony Ribeiro, Tuan Nguyen, Katsumi Inoue,
Abstract要約: バックトラック探索は古典的な制約解決者、プランナー、定理証明者の基礎となる。最近の変圧器に基づく推論システムは, 探索木を自身の中間ステップで探索する。累積トレースで訓練されたデコーダのみの変換器は、この要件を2つの方法で満たさないことを示す。
参考スコア（独自算出の注目度）: 5.709908922073304
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Backtracking search underlies classical constraint solvers, planners, and theorem provers. Recent transformer-based reasoning systems explore search trees over their own intermediate steps. A common training recipe fits an autoregressive next-token loss on offline solver traces. The model's input at each step is a cumulative trace of all prior decisions. The optimal continue-or-backtrack predictor depends only on the current search state, since two trajectories reaching the same state admit the same viable continuations. We show that decoder-only transformers trained on cumulative traces fail this requirement in two ways: the trace can scatter state features across many positions (scattered retrieval), and the predictor can condition on the trajectory rather than the state (history entanglement). We address scattered retrieval with localization, a trace-level fix that rewrites each decision block to expose state features locally. We address history entanglement with Selective State Attention (SSA), a fixed attention mask that enforces state-based decisions structurally without modifying training data, objective, or parameters. We focus on reactive verification, after propagation has exposed a contradiction. We test SSA on 3-SAT, graph coloring, Blocks World, and backtracking parsing. On same-state pairs that differ only in prior history, SSA emits identical decisions while a cumulative-trained causal baseline does not. Our contribution is a diagnostic of transformer behavior on serialized trajectory data, paired with a structural fix. Pretrained language models that search over their own reasoning steps may face the same failure. Our analysis opens up inference-time context clearing as a candidate way to apply the same isolation without retraining.
Abstract（参考訳）: バックトラック探索は古典的な制約解決者、プランナー、定理証明者の基礎となる。最近の変圧器に基づく推論システムは, 探索木を自身の中間ステップで探索する。一般的なトレーニングレシピは、オフラインのソルバトレースで自動回帰的な次トーケン損失に適合する。各ステップにおけるモデルの入力は、すべての事前決定の累積的トレースである。最適な継続またはバックトラック予測器は、同じ状態に達する2つの軌道が同じ持続性を持つため、現在の探索状態にのみ依存する。累積トレースで訓練されたデコーダのみの変換器は、多くの位置にわたって状態特徴を散乱させることができ(散乱された検索)、予測器は状態よりも軌道に条件を付けることができる(歴史の絡み合い)。我々は,各決定ブロックを書き換えて局所的な状態特徴を露呈するトレースレベルの修正であるローカライゼーションによる分散検索に対処する。 SSA(Selective State Attention)は、トレーニングデータや目的、パラメータを変更することなく、状態ベースの決定を構造的に強制する固定された注意マスクである。伝搬が矛盾を露呈した後は、反応性検証に重点を置いている。 3SAT、グラフカラー、ブロックワールド、バックトラックパーシングでSSAをテストする。既往歴にのみ異なる同状態対において、SSAは累積学習された因果基底線が存在しないのに対して、同一の決定を出力する。我々の貢献は、直列化軌道データにおける変圧器の挙動の診断であり、構造的修正と組み合わせている。独自の推論ステップを探索する事前訓練された言語モデルは、同じ障害に直面します。我々の分析は、推論時コンテキストクリア化を、再トレーニングせずに同じ分離を適用するための候補方法として開きます。

論文の概要: Can Transformers Learn to Verify During Backtracking Search?

関連論文リスト