Fugu-MT 論文翻訳(概要): How Do LLMs Use Their Depth?

論文の概要: How Do LLMs Use Their Depth?

arxiv url: http://arxiv.org/abs/2510.18871v1
Date: Tue, 21 Oct 2025 17:59:05 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:14.12511
Title: How Do LLMs Use Their Depth?
Title（参考訳）: LLMはどのようにして深さを使うのか?
Authors: Akshat Gupta, Jay Yeung, Gopala Anumanchipalli, Anna Ivanova,
Abstract要約: 大規模言語モデルは深度を均一に用いていないが,層レベルでの予測力学の詳細な理解はいまだに得られていない。本稿では,LLMが内部的に計算を構成して予測を行う方法について説明する。
参考スコア（独自算出の注目度）: 17.148445769990907
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Growing evidence suggests that large language models do not use their depth uniformly, yet we still lack a fine-grained understanding of their layer-wise prediction dynamics. In this paper, we trace the intermediate representations of several open-weight models during inference and reveal a structured and nuanced use of depth. Specifically, we propose a "Guess-then-Refine" framework that explains how LLMs internally structure their computations to make predictions. We first show that the top-ranked predictions in early LLM layers are composed primarily of high-frequency tokens, which act as statistical guesses proposed by the model early on due to the lack of appropriate contextual information. As contextual information develops deeper into the model, these initial guesses get refined into contextually appropriate tokens. Even high-frequency token predictions from early layers get refined >70% of the time, indicating that correct token prediction is not "one-and-done". We then go beyond frequency-based prediction to examine the dynamic usage of layer depth across three case studies. (i) Part-of-speech analysis shows that function words are, on average, the earliest to be predicted correctly. (ii) Fact recall task analysis shows that, in a multi-token answer, the first token requires more computational depth than the rest. (iii) Multiple-choice task analysis shows that the model identifies the format of the response within the first half of the layers, but finalizes its response only toward the end. Together, our results provide a detailed view of depth usage in LLMs, shedding light on the layer-by-layer computations that underlie successful predictions and providing insights for future works to improve computational efficiency in transformer-based models.
Abstract（参考訳）: 増大する証拠は、大きな言語モデルがその深さを均一に使用していないことを示唆するが、レイヤーワイドな予測力学の詳細な理解はいまだに欠けていることを示唆している。本稿では,いくつかのオープンウェイトモデルの中間表現を推論中に追跡し,構造的かつニュアンスな深度の利用を明らかにする。具体的には、LLMが内部的に計算を構成して予測を行う方法について説明する「Guess-then-Refine」フレームワークを提案する。まず、初期のLCM層における上位予測は、主に高周波トークンで構成されており、適切な文脈情報がないため、モデルが早期に提案した統計的推測として機能することを示す。文脈情報がモデルに深く入り込むにつれて、これらの初期推測は文脈的に適切なトークンへと洗練される。初期の層からの高周波トークン予測でさえ70%の時間で洗練され、正しいトークン予測が"ワン・アンド・ドーン"ではないことを示す。次に、周波数に基づく予測を超えて、3つのケーススタディにおける層深さの動的利用について検討する。 (i) 音声のパート・オブ・スペル分析により, 関数語は, 平均的に最も早く正確に予測できることが示唆された。 (II) ファクトリコールタスク分析は,複数解答において,第1のトークンは他よりも計算深度が高いことを示す。三複数選択タスク分析により、モデルが各レイヤの前半で応答の形式を識別するが、その応答は最後にのみ終了することを示す。この結果から,LLMの深度分布を詳細に把握し,層間計算に光を当てて予測を成功させ,トランスフォーマーモデルにおける計算効率を向上させるための今後の研究の洞察を得ることができた。

論文の概要: How Do LLMs Use Their Depth?

関連論文リスト