Fugu-MT 論文翻訳(概要): Latent Reasoning in LLMs as a Vocabulary-Space Superposition

論文の概要: Latent Reasoning in LLMs as a Vocabulary-Space Superposition

arxiv url: http://arxiv.org/abs/2510.15522v1
Date: Fri, 17 Oct 2025 10:51:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-20 20:17:34.585395
Title: Latent Reasoning in LLMs as a Vocabulary-Space Superposition
Title（参考訳）: 語彙空間重ね合わせとしてのLDMの潜時推論
Authors: Jingcheng Deng, Liang Pang, Zihao Wei, Shichen Xu, Zenghao Duan, Kun Xu, Yang Song, Huawei Shen, Xueqi Cheng,
Abstract要約: 大規模言語モデル(LLM)は、チェーン・オブ・シークレット・プロンプトによる強力な推論能力を示すが、明示的な推論は計算上のオーバーヘッドを大幅に引き起こす。遅延推論に関する最近の研究は、明示的な監督なしに遅延空間を推論することでコストを削減するが、性能は著しく低下する。この問題に対処するため、LLM語彙の列空間に潜伏空間を制限し、潜伏推論を語彙確率の重ね合わせとして扱う。後続の推論が終わると、それは最終的な答えを得るために明確な推論の固有状態に崩壊する。 Latent-SFTはGSM8kに新しい状態を設定し、明示的に一致する
参考スコア（独自算出の注目度）: 80.01651003144282
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) demonstrate strong reasoning abilities with chain-of-thought prompting, but explicit reasoning introduces substantial computational overhead. Recent work on latent reasoning reduces this cost by reasoning in latent space without explicit supervision, but performance drops significantly. Our preliminary experiments suggest that this degradation stems from the unstructured latent space, which makes fitting latent tokens difficult. To address this, we restrict the latent space to the column space of the LLM vocabulary, treating latent reasoning as a superposition over vocabulary probabilities. Once latent reasoning concludes, it collapses into an eigenstate of explicit reasoning to yield the final answer. Based on this idea, we propose Latent-SFT, a two-stage learning framework. In the first stage, we design two specialized attention masks to guide the Latent Token Encoder in generating latent tokens, allowing the LLM to produce the correct answer conditioned on them. In the second stage, the Latent Token Encoder is discarded, and the LLM is directly trained to generate these latent tokens autonomously for latent reasoning, optimized with KL and CE losses. Latent-SFT sets a new state of the art on GSM8k, matching explicit SFT performance while cutting reasoning chains by up to 4 times and outperforming prior latent methods. On Math500 and AIME24, lexical probability-based latent reasoning also clearly surpasses hidden-state-based approaches. Our metrics of effective compression rate and effective global parallelism further show that latent reasoning is both the compression of a single path and the superposition of multiple paths.
Abstract（参考訳）: 大規模言語モデル(LLM)は、チェーン・オブ・シークレット・プロンプトによる強力な推論能力を示すが、明示的な推論は計算上のオーバーヘッドを大幅に引き起こす。遅延推論に関する最近の研究は、明示的な監督なしに遅延空間を推論することでコストを削減するが、性能は著しく低下する。予備実験により, この劣化は非構造潜在空間に起因することが示唆された。この問題に対処するため、LLM語彙の列空間に潜伏空間を制限し、潜伏推論を語彙確率の重ね合わせとして扱う。後続の推論が終わると、それは最終的な答えを得るために明確な推論の固有状態に崩壊する。このアイデアに基づいて,2段階学習フレームワークであるLatent-SFTを提案する。最初の段階では2つの特別な注意マスクを設計し、潜在トークンを生成するために潜在トークンエンコーダを誘導し、LLMがそれらに条件付けされた正しい回答を生成できるようにする。第2段階では、潜伏トークンエンコーダは破棄され、LSMは、KLとCEの損失に最適化された潜伏トークンを自律的に生成するように直接訓練される。 Latent-SFT は GSM8k に新しい最先端をセットし、明示的な SFT 性能と一致し、推論チェーンを最大 4 倍に切断し、遅延メソッドよりも優れた性能を発揮する。 Math500 と AIME24 では、語彙的確率に基づく潜在推論も明らかに隠れ状態に基づくアプローチを超越している。さらに, 実効圧縮率と実効グローバル並列性の測定値から, 遅延推論は単一経路の圧縮と複数経路の重畳の両方であることが示された。

論文の概要: Latent Reasoning in LLMs as a Vocabulary-Space Superposition

関連論文リスト