Fugu-MT 論文翻訳(概要): Thinking into the Future: Latent Lookahead Training for Transformers

論文の概要: Thinking into the Future: Latent Lookahead Training for Transformers

arxiv url: http://arxiv.org/abs/2603.20219v1
Date: Tue, 03 Mar 2026 17:15:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 02:36:12.902499
Title: Thinking into the Future: Latent Lookahead Training for Transformers
Title（参考訳）: 未来を考える:変圧器のラテンダヘッドトレーニング
Authors: Lorenzo Noci, Gregor Bachmann, Seyed-Mohsen Moosavi-Dezfooli, Moin Nabi,
Abstract要約: 次トーケン予測で訓練された自己回帰言語モデルは、一度に1つの離散トークンをサンプリングすることでテキストを生成する。我々は、モデルが生成前に"考える"ことができるトレーニング戦略である潜在ルックアヘッドを紹介します。本研究では,潜在ルックアヘッドが,迷路解決やスドク,ProsQAといった計画課題において,自己回帰的,非自己回帰的ベースラインを著しく上回っていることを示す。
参考スコア（独自算出の注目度）: 34.73973224120233
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive language models trained with next-token prediction generate text by sampling one discrete token at a time. Although very scalable, this objective forces the model to commit at every step, preventing it from exploring or reflecting upon multiple plausible continuations. Furthermore, the compute allocation across tokens is uniform; every token is formed based on a single forward-pass, potentially limiting the model's expressiveness in cases where difficult tokens require inherently more compute. Towards addressing these limitations, we introduce latent lookahead, a training strategy that enables models to "think" before generating: at selected positions in the sequence, before committing to the next token, the model performs a multi-step lookahead in latent space. More precisely, instead of sampling future tokens, we leverage the network's latent space by recursively feeding its hidden states back into the context for $τ$ steps, investing more compute on predicting that token. This produces $τ$ latent predictions that are supervised against the next $τ$ ground-truth tokens, encouraging the model to "lookahead" and refine its prediction. We show that latent lookahead substantially outperforms both autoregressive and non-autoregressive baselines on planning tasks such as maze solving, Sudoku, and ProsQA, where foresight is essential.
Abstract（参考訳）: 次トーケン予測で訓練された自己回帰言語モデルは、一度に1つの離散トークンをサンプリングすることでテキストを生成する。非常にスケーラブルだが、この目的はモデルをあらゆるステップでコミットさせ、複数のもっともらしい継続を探索したり、反映したりするのを防ぐ。さらに、トークン間の計算割り当ては均一であり、すべてのトークンは単一のフォワードパスに基づいて形成され、難解なトークンが本質的により多くの計算を必要とする場合、モデルの表現性を制限する可能性がある。これらの制限に対処するために、我々は、列内の選択された位置において、次のトークンにコミットする前に、潜在空間で多段階のルックアヘッドを実行する、モデルを生成前に"考える"ことができるトレーニング戦略であるLatent Lookaheadを導入する。より正確には、将来のトークンをサンプリングする代わりに、隠れた状態をコンテキストに再帰的にフィードバックし、そのトークンを予測するためにより多くの計算に投資することで、ネットワークの潜伏空間を活用します。これにより、次の$τ$グランドトルーストークンに対して教師される遅延予測が$τ$で作成され、モデルが"注目"し、予測を洗練するように促される。本研究は, 目視が不可欠である迷路解決, スドク, ProsQAなどの計画課題において, 自己回帰ベースラインと非自己回帰ベースラインの両方を著しく上回っていることを示す。

論文の概要: Thinking into the Future: Latent Lookahead Training for Transformers

関連論文リスト