Fugu-MT 論文翻訳(概要): Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

論文の概要: Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

arxiv url: http://arxiv.org/abs/2511.08577v1
Date: Wed, 12 Nov 2025 02:05:42 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-12 20:17:03.871048
Title: Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
Title（参考訳）: Think-at-Hard: 推論言語モデルを改善するための選択的な遅延イテレーション
Authors: Tianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, Yu Wang,
Abstract要約: シンク・アット・ハード(Think-at-Hard, TaH)は、ハードトークンでのみ深く反復する動的潜在思考法である。 TaHは5つの挑戦的なベンチマークで推論のパフォーマンスを向上する。
参考スコア（独自算出の注目度）: 22.525318796588568
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Improving reasoning capabilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications. Prior work proposes recurrent transformers, which allocate a fixed number of extra iterations per token to improve generation quality. After the first, standard forward pass, instead of verbalization, last-layer hidden states are fed back as inputs for additional iterations to refine token predictions. Yet we identify a latent overthinking phenomenon: easy token predictions that are already correct after the first pass are sometimes revised into errors in additional iterations. To address this, we propose Think-at-Hard (TaH), a dynamic latent thinking method that iterates deeper only at hard tokens. It employs a lightweight neural decider to trigger latent iterations only at tokens that are likely incorrect after the standard forward pass. During latent iterations, Low-Rank Adaptation (LoRA) modules shift the LLM objective from general next-token prediction to focused hard-token refinement. We further introduce a duo-causal attention mechanism that extends attention from the token sequence dimension to an additional iteration depth dimension. This enables cross-iteration information flow while maintaining full sequential parallelism. Experiments show that TaH boosts LLM reasoning performance across five challenging benchmarks while maintaining the same parameter count. Compared with baselines that iterate twice for all output tokens, TaH delivers 8.1-11.3% accuracy gains while exempting 94% of tokens from the second iteration. Against strong single-iteration Qwen3 models finetuned with the same data, it also delivers 4.0-5.0% accuracy gains. When allowing less than 3% additional parameters from LoRA and the iteration decider, the gains increase to 8.5-12.6% and 5.3-5.4%, respectively. Our code is available at https://github.com/thu-nics/TaH.
Abstract（参考訳）: 大規模言語モデル(LLM)の推論能力の改善は、特にパラメータ制約の下では、現実世界のアプリケーションにとって不可欠である。以前の作業では、生成品質を改善するためにトークン毎に一定数の追加イテレーションを割り当てるリカレントトランスフォーマーが提案されていた。第一に、標準的なフォワードパスは、動詞化ではなく、最後の層隠れ状態がトークン予測を洗練するための追加イテレーションの入力として返される。簡単なトークン予測は、最初のパス後にすでに修正されているが、追加のイテレーションでエラーに修正されることもある。そこで我々は,ハードトークンのみを深く反復する動的潜在思考法であるThink-at-Hard (TaH)を提案する。軽量なニューラル決定器を使用して遅延反復をトリガーし、標準のフォワードパス後に誤る可能性のあるトークンのみをトリガーする。潜時繰り返しにおいて、ローランド適応 (LoRA) モジュールはLLMの目的を一般の次点予測から焦点を絞った強点修正へとシフトさせる。さらに、トークン列次元から追加の反復深さ次元への注意を拡大する二重因果的注意機構を導入する。これにより、完全なシーケンシャル並列性を維持しながら、クロスイテレーション情報フローが可能になる。実験により、TaHは、同じパラメータ数を維持しながら、5つの挑戦的なベンチマークでLSM推論性能を向上することが示された。すべての出力トークンを2回繰り返すベースラインと比較すると、TaHは8.1-11.3%の精度向上を実現し、第2イテレーションからトークンの94%を除外している。同じデータで微調整された強力なシングルイテレーションQwen3モデルに対して、精度は4.0-5.0%向上する。 LoRAと反復判定器から3%未満の追加パラメータを許可すると、それぞれ8.5-12.6%と5.3-5.4%に上昇する。私たちのコードはhttps://github.com/thu-nics/TaH.comで利用可能です。

論文の概要: Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

関連論文リスト