Fugu-MT 論文翻訳(概要): Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning

論文の概要: Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning

arxiv url: http://arxiv.org/abs/2602.06600v1
Date: Fri, 06 Feb 2026 10:53:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-09 22:18:26.364421
Title: Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning
Title（参考訳）: アンカーとしてのエコー: LLM推論における確率的コストと注意再焦点
Authors: Zhuoyuan Hao, Zhuo Li, Wu Li, Fangming Liu, Min Zhang, Jing Li,
Abstract要約: 大規模推論モデル(LRM)におけるテスト時間計算割り当ては広く使われ、数学的問題解決、コード合成、計画に応用されている。本稿では,EmphEcho of Prompt (EOP) を前装式計算整形機構として用いて,モデルが再帰する傾向を分析し,活用する。
参考スコア（独自算出の注目度）: 25.852162778115808
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Test-time compute allocation in large reasoning models (LRMs) is widely used and has applications in mathematical problem solving, code synthesis, and planning. Recent work has addressed this problem by scaling self-consistency and parallel thinking, adding generic ``thinking tokens'' and prompting models to re-read the question before answering. Unfortunately, these approaches either inject task-agnostic tokens or mandate heuristics that do not explain -- and often ignore -- the \emph{spontaneous} repetition that many LRMs exhibit at the head of their internal chains. In contrast, we analyze and harness the model's tendency to restate the question, which we term the \emph{Echo of Prompt (EOP)}, as a front-loaded, compute-shaping mechanism. We formalize its probabilistic cost by casting echo removal as rejection-based conditioning and defining the \emph{Echo Likelihood Gap} $Δ\mathcal{L}$ as a computable proxy. This provides the missing theoretical link that links early repetition to likelihood gains and downstream accuracy. However, it does not by itself specify how to exploit EOP. Consequently, we develop \emph{Echo-Distilled SFT (ED-SFT)} to instill an ``echo-then-reason'' pattern through supervised finetuning, and \emph{Echoic Prompting (EP)} to re-ground the model mid-trace without training. While promising, quantifying benefits beyond verbosity is non-trivial. Therefore, we conduct length and suffix-controlled likelihood analyses together with layer-wise attention studies, showing that EOP increases answer to answer-prefix attention in middle layers, consistent with an \emph{attention refocusing} mechanism. We evaluate on GSM8K, MathQA, Hendrycks-MATH, AIME24, and MATH-500 under identical decoding settings and budgets, and find consistent gains over baselines. Code is available at https://github.com/hhh2210/echoes-as-anchors.
Abstract（参考訳）: 大規模推論モデル(LRM)におけるテスト時間計算割り当ては広く使われ、数学的問題解決、コード合成、計画に応用されている。最近の研究は、自己整合性と並列思考のスケールアップ、ジェネリックな‘トークンを考える’の追加、そしてモデルが答える前に質問を再読するように促すことによって、この問題に対処している。残念なことに、これらのアプローチはタスクに依存しないトークンを注入するか、内部鎖の先頭に多くの LRM が示している 'emph{spontaneous' 反復を説明せず、しばしば無視する委任的ヒューリスティック(英語版)を注入する。対照的に、我々はモデルが問題を再検討する傾向を分析・活用し、これを「EOP(Eemph{Echo of Prompt)」と呼ぶ。我々は、エコー除去を拒絶条件としてキャストし、計算可能なプロキシとして \emph{Echo Likelihood Gap} $Δ\mathcal{L}$を定義することにより、その確率的コストを定式化する。これにより、初期の繰り返しを利得と下流の精度にリンクする理論的なリンクが欠落している。しかし、それ自体はEOPの活用方法を規定していない。その結果,教師付き微調整により<echo-then-reason'パターンを注入する<emph{Echo-Distilled SFT (ED-SFT) と,トレーニングなしでモデルミッドトラスを再構築する<emph{Echoic Prompting (EP) を開発した。有望ではあるが、冗長性を超えたメリットの定量化は簡単ではない。そこで,本研究では,中層における解答前注意に対するEOPの増大が,<emph{attention refocusing} 機構と一致していることを示す。我々は,GSM8K,MathQA,Hendrycks-MATH,AIME24,MATH-500を同一のデコード設定と予算で評価し,ベースラインよりも一貫した利得を求める。コードはhttps://github.com/hhh2210/echoes-as-anchors.comで公開されている。

関連論文リスト

$\ abla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space [71.23672814629448]
$nabla$-Reasonerは、トークンログに対する差別化可能な最適化をデコードループに統合する反復生成フレームワークである。 $nabla$-Reasonerは、挑戦的な数学的推論ベンチマークで20%以上の精度の向上を実現している。
論文参考訳（メタデータ） (2026-03-05T08:42:54Z)
Learning to Forget Attention: Memory Consolidation for Adaptive Compute Reduction [6.908972852063454]
状態空間モデルと注意を結合したハイブリッドアーキテクチャは、高い効率品質のトレードオフを実現している。テキストbf88%の注意操作は、モデルの隠れた状態から既に予測可能な情報を取得する。 textbfours (textbfConsolidation-based textbfRouting for textbfAdaptive textbfMemory) は生物学的にインスパイアされたメモリ統合機構で、エピソード検索をパラメトリックセマンティックメモリに徐々に蒸留する。
論文参考訳（メタデータ） (2026-02-12T17:40:15Z)
Thinking Traps in Long Chain-of-Thought: A Measurable Study and Trap-Aware Adaptive Restart [27.904791075662896]
TAAR(Trap-Aware Adaptive Restart)は,部分軌道から2つの信号を予測するための診断ポリシーをトレーニングするテスト時間制御フレームワークである。推測時、TAARは予測されたトラップセグメントの前に軌道を切断し、復号を適応的に再起動する。実験の結果,TAARはモデルパラメータを微調整することなく推論性能を向上させることがわかった。
論文参考訳（メタデータ） (2026-01-17T07:26:02Z)
One Token Embedding Is Enough to Deadlock Your Large Reasoning Model [91.48868589442837]
我々は, LRMの生成制御フローをハイジャックする資源枯渇手法であるDeadlock Attackを提案する。提案手法は4つの先進LEMにおいて100%の攻撃成功率を達成する。
論文参考訳（メタデータ） (2025-10-12T07:42:57Z)
Thinking Before You Speak: A Proactive Test-time Scaling Approach [54.8205006555199]
emphThinking Before You Speak (TBYS)という名前の推論フレームワークとして、私たちのアイデアを実装しています。インテリジェンス生成のためのコンテキスト内サンプルを自動的に収集・フィルタリングするパイプラインを設計する。挑戦的な数学的データセットの実験は、TBYSの有効性を検証する。
論文参考訳（メタデータ） (2025-08-26T03:43:32Z)
Reliability, Embeddedness, and Agency: A Utility-Driven Mathematical Framework for Agent-Centric AI Adoption [0.0]
我々は,マルチステップタスクを実行するエージェント中心のAIシステムの採用を継続するための3つの公理を定式化する。我々は、崩壊するノベルティ用語と成長するユーティリティ用語の和として、採用をモデル化する。
論文参考訳（メタデータ） (2025-08-18T12:53:38Z)
What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding [84.42056293290015]
推論モデルと非推論モデルの間のトークンレベルのミスアライメントを分析する。本稿では,FoReaL-Decodingを提案する。一般的な4つの数学推論ベンチマークにおいて、FoReaL-Decodingは理論FLOPを30から50%減らし、CoTの長さを最大40%減らした。
論文参考訳（メタデータ） (2025-06-08T05:08:32Z)
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models [86.88657425848547]
大型推論モデル(LRMs)はすでに長い連鎖推論のための潜在能力を持っている。我々は、自動生成の自己検証タスクを使用して、モデルに推論、帰納、誘拐の3つのメタ能力を持たせることを明確にした。我々の3つのステージ・パイプラインの個別アライメント、パラメータ空間のマージ、ドメイン固有の強化学習は、命令調整ベースラインと比較して10%以上のパフォーマンス向上を実現します。
論文参考訳（メタデータ） (2025-05-15T17:58:33Z)
Language Model Uncertainty Quantification with Attention Chain [9.093726246465117]
大規模言語モデル(LLM)の予測の不確実性は、その答えの信頼性を判断するために重要である。 UQACは,推論空間をトラクタブルなサイズに縮小し,限界化を実現するための効率的な手法である。先進的なオープンソース LLM を用いた複数の推論ベンチマークにおいて,UQAC の有効性を検証した。
論文参考訳（メタデータ） (2025-03-24T21:43:47Z)
Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification [50.717692060500696]
対数損失を伴う次のトーケン予測は自己回帰シーケンスモデリングの基盤となる。次トーケン予測は、適度な誤差増幅を表す$C=tilde O(H)$を達成するために堅牢にすることができる。 C=e(log H)1-Omega(1)$。
論文参考訳（メタデータ） (2025-02-18T02:52:00Z)
FLARE: Faithful Logic-Aided Reasoning and Exploration [47.46564769245296]
タスク分解を用いて問題空間をトラバースする新しい手法を提案する。我々はLarge Language Modelsを使ってソリューションを計画し、クエリを事実に軟式化し、論理プログラミングコードを使って述語する。提案手法は,生成したコードに対する推論プロセスの忠実度を計算し,外部の解法に頼らずにマルチホップ探索のステップを解析する。
論文参考訳（メタデータ） (2024-10-14T19:39:11Z)
Training Chain-of-Thought via Latent-Variable Inference [30.21067593018967]
大規模言語モデル(LLM)は、チェーン・オブ・シンクレットのプロンプトを使って解答ステップを実行するように指示されたときに、より正確かつ解釈可能な問題を解決する。 CoTと教師付きチューニングを組み合わせるには、正しい回答だけでなく、それらの答えにつながる詳細な根拠の監督が必要である。そこで本研究では,CoTプロンプトを用いて正しい回答を生成することで,電子対数類似度を最大化するための微調整戦略を提案する。
論文参考訳（メタデータ） (2023-11-28T17:47:32Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。