Fugu-MT 論文翻訳(概要): Pretraining LLM with Latent Thoughts in Continuous Space

論文の概要: Pretraining LLM with Latent Thoughts in Continuous Space

arxiv url: http://arxiv.org/abs/2509.23184v1
Date: Sat, 27 Sep 2025 08:38:08 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.090901
Title: Pretraining LLM with Latent Thoughts in Continuous Space
Title（参考訳）: 連続空間における潜在思考によるLLMの事前学習
Authors: Boyi Zeng, He Li, Shixiang Song, Yixuan Wang, Ziwei He, Xinbing Wang, Zhouhan Lin,
Abstract要約: 本稿では,言語モデルと潜在思考の事前学習手法を提案する。我々のアプローチは言語モデル(LM)を事前訓練し、まず現在位置の最後の隠れ状態である中間潜在思考を生成する。同一の推論コストで、トークンごとに1つの追加の潜在思考を生成するLMが、パラメータの2倍の標準モデルより優れていることを示す。
参考スコア（独自算出の注目度）: 44.24277388571869
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The remarkable success of Chain-of-Thought (CoT), which enhances performance by scaling generation steps at test-time, inspires us to ask: can we leverage a similar scaling of computational steps during pretraining to improve the generation of each individual token? To address this, we propose a novel pre-training methodology: Pretraining Language Models with Latent Thoughts. Our approach pretrains a language model (LM) to first generate an intermediate latent thought-the last hidden state of the current position-which is then used as input to predict the actual subsequent token. This additional computational step enables the LM to refine its prediction within unconstrained continuous space. Our experiments demonstrate that, at an identical inference cost, a LM that generates one additional latent thought per token outperforms a standard model with double the parameters. For instance, ours-1.4B (Pythia Arch), pretrained on 300B tokens from the Pile, significantly surpasses the vanilla Pythia-2.8B trained on the same data on both language modeling and a range of general downstream tasks. Furthermore, increasing the number of latent thoughts generated before each actual token-forming a chain analogous to CoT-consistently improves the model's performance.
Abstract（参考訳）: テスト時に生成ステップをスケールすることでパフォーマンスを向上させるChain-of-Thought(CoT)の顕著な成功から、私たちは、事前トレーニング中に同様の計算ステップのスケーリングを活用して、個々のトークンの生成を改善することができるのでしょうか? そこで本研究では,言語モデルと潜在思考の事前学習手法を提案する。我々のアプローチは言語モデル(LM)を事前訓練し、まず中間潜在思考(現在の位置の最後の隠れ状態)を生成し、次に入力として実際のトークンを予測する。この追加の計算ステップにより、LMは制約のない連続空間内での予測を洗練できる。我々の実験は、同一の推論コストで、トークンごとに1つの追加の潜在思考を生成するLMが、パラメータの2倍の標準モデルより優れていることを示した。例えば、パイルから300Bトークンを事前訓練したWess-1.4B(Pythia Arch)は、言語モデリングと様々な下流タスクの両方で同じデータに基づいて訓練されたバニラPythia-2.8Bを大きく上回っている。さらに,CoTに類似した連鎖を実際のトークン形成前に生成した潜在思想の数を増やすことにより,モデルの性能が向上する。

論文の概要: Pretraining LLM with Latent Thoughts in Continuous Space

関連論文リスト