Fugu-MT 論文翻訳(概要): Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings

論文の概要: Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings

arxiv url: http://arxiv.org/abs/2604.27398v1
Date: Thu, 30 Apr 2026 04:09:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 16:31:53.919954
Title: Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings
Title（参考訳）: テキスト埋め込みで2階の崩壊を定量化する「Mean Pooling」
Authors: Tomomasa Hara, Hiroto Kurita, Masaaki Imaizumi, Kentaro Inui, Sho Yokoi,
Abstract要約: トークンの埋め込みを平均化する平均プールは、テキストの埋め込みを構築するための標準的なアプローチである。本稿では,実際のモデルにおいて,プールが実際に有効であるかどうかを検討する。
参考スコア（独自算出の注目度）: 30.943998879066857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For constructing text embeddings, mean pooling, which averages token embeddings, is the standard approach. This paper examines whether mean pooling actually works well in real models. First, we note that mean pooling can collapse information beyond the first-order statistics of the token embeddings, such as second-order statistics that capture their spatial structure, potentially mapping distinct token embedding distributions to similar text embeddings. Motivated by this concern, we propose a simple metric to quantify such a collapse induced by mean pooling. Then, using this metric, we empirically measure how often this collapse occurs in actual models and texts, and find that modern text encoders are robust to this collapse. In particular, contrastive fine-tuned text encoders tend to be less prone to the collapse than their pretrained backbone models. We also find that the robustness of these text encoders lies in the concentration of token embeddings within each text. In addition, we find that robustness to the collapse, as quantified by our proposed metric, correlates with downstream task performance. Overall, our findings offer a new perspective on why modern text encoders remain effective despite relying on seemingly coarse mean pooling.
Abstract（参考訳）: テキスト埋め込みを構築する場合、平均的なトークン埋め込みである平均プールは標準的なアプローチである。本稿では,実際のモデルにおいて,プールが実際に有効であるかどうかを検討する。まず,空間構造を捉えた2次統計や,異なるトークン埋め込み分布を類似したテキスト埋め込みにマッピングする可能性など,トークン埋め込みの1次統計以上の情報をプールすることで,プールが崩壊する可能性があることに留意する。この懸念に乗じて、平均プーリングによって引き起こされる崩壊を定量化するための簡単な計量法を提案する。そして、この測定値を用いて、実際のモデルやテキストでこの崩壊の発生頻度を実証的に測定し、現代のテキストエンコーダがこの崩壊に対して堅牢であることを示す。特に、対照的に微調整されたテキストエンコーダは、事前訓練されたバックボーンモデルよりも崩壊しやすい傾向にある。また、これらのテキストエンコーダのロバスト性は、各テキストにトークンの埋め込みが集中していることが分かる。さらに,この崩壊に対するロバスト性は,提案した指標によって定量化され,下流タスク性能と相関することがわかった。全体として、この発見は、最近のテキストエンコーダが、粗い平均プールに依存しているにもかかわらず、なぜ有効であるのか、新たな視点を提供する。

論文の概要: Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings

関連論文リスト