Fugu-MT 論文翻訳(概要): Hierarchical Token Prepending: Enhancing Information Flow in Decoder-based LLM Embeddings

論文の概要: Hierarchical Token Prepending: Enhancing Information Flow in Decoder-based LLM Embeddings

arxiv url: http://arxiv.org/abs/2511.14868v1
Date: Tue, 18 Nov 2025 19:37:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-20 15:51:28.505698
Title: Hierarchical Token Prepending: Enhancing Information Flow in Decoder-based LLM Embeddings
Title（参考訳）: 階層型トークン予測:デコーダベースのLLM埋め込みにおける情報フローの強化
Authors: Xueying Ding, Xingyue Huang, Mingxuan Ju, Liam Collins, Yozen Liu, Leman Akoglu, Neil Shah, Tong Zhao,
Abstract要約: 本稿では,注目レベルの圧縮と読み出しレベルのオーバーシャッシングを緩和する階層型トークンプレッペンディングを提案する。 HTPは、入力をブロックに分割し、ブロックレベルの要約トークンをその後のブロックにプリペンドし、後方情報フローの経路を作成する。シンプルなアーキテクチャに依存しない方法として、HTPはゼロショットモデルと微調整モデルの両方を強化し、優れた長期文書埋め込みへのスケーラブルなルートを提供する。
参考スコア（独自算出の注目度）: 52.49524240846879
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models produce powerful text embeddings, but their causal attention mechanism restricts the flow of information from later to earlier tokens, degrading representation quality. While recent methods attempt to solve this by prepending a single summary token, they over-compress information, hence harming performance on long documents. We propose Hierarchical Token Prepending (HTP), a method that resolves two critical bottlenecks. To mitigate attention-level compression, HTP partitions the input into blocks and prepends block-level summary tokens to subsequent blocks, creating multiple pathways for backward information flow. To address readout-level over-squashing, we replace last-token pooling with mean-pooling, a choice supported by theoretical analysis. HTP achieves consistent performance gains across 11 retrieval datasets and 30 general embedding benchmarks, especially in long-context settings. As a simple, architecture-agnostic method, HTP enhances both zero-shot and finetuned models, offering a scalable route to superior long-document embeddings.
Abstract（参考訳）: 大規模言語モデルは強力なテキスト埋め込みを生成するが、その因果的注意機構は、後のトークンから以前のトークンへの情報の流れを制限し、表現品質を劣化させる。最近の手法では、単一の要約トークンをプリプレプションすることでこの問題を解決しようとするが、情報を過剰に圧縮し、長いドキュメントのパフォーマンスを損なう。本稿では,2つの重要なボトルネックを解決する手法である階層型トークン予測(HTP)を提案する。注意レベル圧縮を緩和するために、HTPは入力をブロックに分割し、ブロックレベルの要約トークンをその後のブロックにプリペンドし、後方情報フローのための複数の経路を生成する。読み出しレベルのオーバースカッシングに対処するため、理論的解析によって支持される選択である平均プールに終止値プーリングを置き換える。 HTPは11の検索データセットと30の一般的な埋め込みベンチマークで一貫したパフォーマンス向上を実現している。シンプルなアーキテクチャに依存しない方法として、HTPはゼロショットモデルと微調整モデルの両方を強化し、優れた長期文書埋め込みへのスケーラブルなルートを提供する。

論文の概要: Hierarchical Token Prepending: Enhancing Information Flow in Decoder-based LLM Embeddings

関連論文リスト