Fugu-MT 論文翻訳(概要): When can isotropy help adapt LLMs' next word prediction to numerical domains?

論文の概要: When can isotropy help adapt LLMs' next word prediction to numerical domains?

arxiv url: http://arxiv.org/abs/2505.17135v2
Date: Mon, 26 May 2025 03:55:16 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-27 19:27:26.833908
Title: When can isotropy help adapt LLMs' next word prediction to numerical domains?
Title（参考訳）: 等方性はLLMの次の単語予測を数値領域に適応するのに有効か?
Authors: Rashed Shelim, Shengzhe Xu, Walid Saad, Naren Ramakrishnan,
Abstract要約: 文脈埋め込み空間におけるLLM埋め込みの等方性は、表現の基盤構造をいかに保っているかを示す。実験により、数値データとモデルアーキテクチャの異なる特性が等方性に異なる影響を与える可能性が示されている。
参考スコア（独自算出の注目度）: 53.98633183204453
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent studies have shown that vector representations of contextual embeddings learned by pre-trained large language models (LLMs) are effective in various downstream tasks in numerical domains. Despite their significant benefits, the tendency of LLMs to hallucinate in such domains can have severe consequences in applications such as energy, nature, finance, healthcare, retail and transportation, among others. To guarantee prediction reliability and accuracy in numerical domains, it is necessary to open the black-box and provide performance guarantees through explanation. However, there is little theoretical understanding of when pre-trained language models help solve numeric downstream tasks. This paper seeks to bridge this gap by understanding when the next-word prediction capability of LLMs can be adapted to numerical domains through a novel analysis based on the concept of isotropy in the contextual embedding space. Specifically, we consider a log-linear model for LLMs in which numeric data can be predicted from its context through a network with softmax in the output layer of LLMs (i.e., language model head in self-attention). We demonstrate that, in order to achieve state-of-the-art performance in numerical domains, the hidden representations of the LLM embeddings must possess a structure that accounts for the shift-invariance of the softmax function. By formulating a gradient structure of self-attention in pre-trained models, we show how the isotropic property of LLM embeddings in contextual embedding space preserves the underlying structure of representations, thereby resolving the shift-invariance problem and providing a performance guarantee. Experiments show that different characteristics of numeric data and model architecture could have different impacts on isotropy.
Abstract（参考訳）: 近年の研究では、事前学習された大規模言語モデル(LLM)によって学習された文脈埋め込みのベクトル表現が、数値領域における様々な下流タスクに有効であることが示されている。それらの大きな利点にもかかわらず、こうした領域でLLMが幻覚する傾向は、エネルギー、自然、金融、医療、小売、輸送などの応用に深刻な影響を及ぼす可能性がある。数値領域における予測信頼性と精度を保証するためには,ブラックボックスを開き,説明を通じて性能保証を行う必要がある。しかし、事前訓練された言語モデルが数値下流タスクの解決にいつ役立つかは理論的にはほとんど分かっていない。本稿では,LLMの次単語予測能力が,文脈埋め込み空間における等方性の概念に基づく新しい解析により,数値領域に適応可能であることを理解することで,このギャップを埋めることを模索する。具体的には、LLMの出力層(言語モデルヘッドの自己注意)にソフトマックスを持つネットワークを通じて、そのコンテキストから数値データを予測できるLLMの対数線形モデルを考える。数値領域における最先端性能を達成するために,LLM埋め込みの隠蔽表現はソフトマックス関数のシフト不変性を考慮した構造を持つ必要があることを示す。事前学習されたモデルにおける自己注意の勾配構造を定式化することにより、文脈埋め込み空間におけるLLM埋め込みの等方性は、表現の基盤構造を保ち、シフト不変問題を解消し、性能保証を提供することを示す。実験により、数値データとモデルアーキテクチャの異なる特性が等方性に異なる影響を与える可能性が示されている。

論文の概要: When can isotropy help adapt LLMs' next word prediction to numerical domains?

関連論文リスト