Fugu-MT 論文翻訳(概要): SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models

論文の概要: SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models

arxiv url: http://arxiv.org/abs/2603.06222v1
Date: Fri, 06 Mar 2026 12:34:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 13:17:45.698898
Title: SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models
Title（参考訳）: SPOT:大規模言語モデルにおける効率的かつ解釈可能な潜在推論のためのSpan-level Pause-of-Thought
Authors: Yunlong Chu, Minglai Shao, Yuhang Liu, Bing Hao, Yumeng Lin, Jialu Wang, Ruijie Wang,
Abstract要約: CoT (Explicit Chain-of-Thought) 計算はトークンレベルのトレースを内部化することによって高い推論コストを発生させる。固定応答テンプレートを強制せずに、明示的なCoTをコンパクトなポーズトークンに圧縮するフレキシブルなフレームワークであるSPOTを提案する。推論ベンチマークの実験では、SPOTは生成したトークンを37.5%削減しながら、平均2.3ポイントの精度を向上している。
参考スコア（独自算出の注目度）: 15.95627037350657
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Explicit Chain-of-Thought improves the reasoning performance of large language models but often incurs high inference cost due to verbose token-level traces. While recent approaches reduce this overhead via concise prompting or step pruning, they largely truncate what the model says rather than internalize what the model thinks. Latent reasoning offers a promising alternative by performing computation in the hidden space, yet prior methods face two critical challenges. Many existing approaches rely on rigid point-to-point alignment, forcing a latent token to approximate the final representation of a reasoning step, which can be insufficient to capture the dense, variable-length semantics of an entire reasoning segment. Furthermore, these methods often suffer from a lack of interpretability: latent states are commonly produced by unconstrained optimization or embedding mixing, yielding vectors that are difficult to decode or audit under the pretrained language head. We propose SPOT, a flexible framework that compresses explicit CoT into compact latent pause tokens without enforcing a fixed response template. At the core of SPOT is Span-level Semantic Alignment, a Sinkhorn optimal-transport objective that softly matches each pause token to the semantics of an entire reasoning segment, overcoming the rigidity of step-end alignment. To further improve interpretability, SPOT introduces a Frozen-Head Decoding Constraint that keeps latent states directly decodable as token distributions under the frozen pretrained LM head, enabling readable keyword interpretations of latent thoughts. Experiments on reasoning benchmarks demonstrate that SPOT improves accuracy by 2.3 points on average while reducing generated tokens by 37.5% and provides faithful semantic interpretations of the latent reasoning process.
Abstract（参考訳）: 明示的なChain-of-Thoughtは、大きな言語モデルの推論性能を改善するが、冗長なトークンレベルのトレースのため、しばしば高い推論コストを発生させる。最近のアプローチでは、簡潔なプロンプトやステッププルーニングによって、このオーバーヘッドを減らしていますが、モデルが考えていることを内部化するのではなく、モデルが言うものを大幅に減らしています。潜在推論は、隠れた空間で計算を実行することで有望な代替手段を提供するが、以前の手法は2つの重要な課題に直面している。既存の多くのアプローチは厳密な点対点アライメントに依存しており、潜在トークンは推論ステップの最終的な表現を近似させ、推論セグメント全体の密度の高い可変長セマンティクスをキャプチャするには不十分である。さらに、これらの手法は解釈可能性の欠如に悩まされることが多く、潜在状態は非制約の最適化や埋め込みミキシングによって生成され、事前制約された言語ヘッドの下で復号化や監査が難しいベクトルが生成される。固定応答テンプレートを強制せずに、明示的なCoTをコンパクトなポーズトークンに圧縮するフレキシブルなフレームワークであるSPOTを提案する。 SPOTのコアとなるSpanレベルセマンティックアライメント(Span-level Semantic Alignment)は、Sinkhornの最適トランスポート目的であり、各ポーズトークンと推論セグメント全体のセマンティクスとをソフトにマッチングし、ステップエンドアライメントの剛性を克服する。解釈性をさらに向上するため、SPOTは、凍結した事前訓練されたLMヘッドの下のトークン分布として、潜伏状態を直接デオード可能なFrozen-Head Decoding Constraintを導入し、潜伏思想の読みやすいキーワード解釈を可能にした。推論ベンチマークの実験では、SPOTは生成したトークンを37.5%削減し、平均2.3ポイントの精度を向上し、潜在推論プロセスの忠実な意味解釈を提供する。

論文の概要: SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models

関連論文リスト