Fugu-MT 論文翻訳(概要): Language Model Planning from an Information Theoretic Perspective

論文の概要: Language Model Planning from an Information Theoretic Perspective

arxiv url: http://arxiv.org/abs/2509.25260v1
Date: Sun, 28 Sep 2025 01:58:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 17:09:04.212142
Title: Language Model Planning from an Information Theoretic Perspective
Title（参考訳）: 情報理論から見た言語モデルプランニング
Authors: Muhammed Ustaomeroglu, Baris Askin, Gauri Joshi, Carlee Joe-Wong, Guannan Qu,
Abstract要約: デコーダのみの言語モデル(LM)は、コヒーレントな長距離生成をサポートするために中間計算を編成する。計画には、長い地平線上で計算を構造化し、複数の可能な継続を考慮し、過去の情報を選択的に再利用することが含まれる。我々は、合成文法、パスフィニングタスク、自然言語データセットにわたるLMにおける計画について研究する。
参考スコア（独自算出の注目度）: 31.323156960716826
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The extent to which decoder-only language models (LMs) engage in planning, that is, organizing intermediate computations to support coherent long-range generation, remains an open and important question, with implications for interpretability, reliability, and principled model design. Planning involves structuring computations over long horizons, considering multiple possible continuations, and selectively reusing past information, but how effectively transformer-based LMs realize these capabilities is still unclear. We address these questions by analyzing the hidden states at the core of transformer computations, which capture intermediate results and act as carriers of information. Since these hidden representations are often redundant and encumbered with fine-grained details, we develop a pipeline based on vector-quantized variational autoencoders that compresses them into compact summary codes. These codes enable measuring mutual information, allowing systematic analysis of the computational structure underlying model behavior. Using this framework, we study planning in LMs across synthetic grammar, path-finding tasks, and natural language datasets, focusing on three key aspects: (i) the planning horizon of pre-output computations, (ii) the extent to which the model considers alternative valid continuations, and (iii) the reliance of new predictions on earlier computations. By answering these questions, we advance the understanding of how planning is realized in LMs and contribute a general-purpose pipeline for probing the internal dynamics of LMs and deep learning systems. Our results reveal that the effective planning horizon is task-dependent, that models implicitly preserve information about unused correct continuations, and that predictions draw most on recent computations, though earlier blocks remain informative.
Abstract（参考訳）: デコーダのみの言語モデル(LM)が計画に関わる範囲、すなわち、コヒーレントな長距離生成をサポートするために中間計算を整理することは、解釈可能性、信頼性、原理化されたモデル設計に影響を及ぼす、オープンで重要な問題である。計画には、複数の可能な継続を考慮し、過去の情報を選択的に再利用する、長い地平線上での計算の構造化が含まれるが、いかに効果的にトランスフォーマーベースのLMがこれらの機能を実現するかは、まだ不明である。これらの問題に対処するために、トランスフォーマー計算のコアにある隠れ状態を分析し、中間結果をキャプチャし、情報のキャリアとして機能する。これらの隠蔽表現は、しばしば冗長で細部まで細部を包含しているため、ベクトル量子化変分オートエンコーダに基づくパイプラインを開発し、コンパクトな要約符号に圧縮する。これらの符号は相互情報の計測を可能にし、モデル行動の基礎となる計算構造の体系的な解析を可能にする。このフレームワークを用いて、合成文法、パスフィニングタスク、自然言語データセットにわたるLMの計画について研究し、3つの重要な側面に焦点を当てた。一事前出力計算の計画的地平二モデルが代替有効な継続を考慮すべき程度、及び三以前の計算に新しい予測を頼ること。これらの疑問に答えることで、我々は、LMにおける計画の実現方法の理解を深め、LMとディープラーニングシステムの内部ダイナミクスを探索するための汎用パイプラインに寄与する。本結果から, 有効計画の地平線はタスク依存であり, モデルでは未使用の正しい継続に関する情報を暗黙的に保存し, 予測は近年の計算に大きく影響していることがわかった。

論文の概要: Language Model Planning from an Information Theoretic Perspective

関連論文リスト