Fugu-MT 論文翻訳(概要): Layer-wise Token Compression for Efficient Document Reranking

論文の概要: Layer-wise Token Compression for Efficient Document Reranking

arxiv url: http://arxiv.org/abs/2605.20683v1
Date: Wed, 20 May 2026 03:52:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.473083
Title: Layer-wise Token Compression for Efficient Document Reranking
Title（参考訳）: 効率的な文書更新のためのレイヤワイドトークン圧縮
Authors: Shengyao Zhuang, zhichao Xu, Ivano Lauriola,
Abstract要約: クロスエンコーダのリランカは、推論時に長いクエリドキュメントシーケンスを処理するため、高い計算コストを被る。中間変圧器層に適応トークンプーリングを適用するレイヤワイドトークン圧縮を提案する。また,中間層での圧縮は評価品質を保ちながら,経路ランクでは25%,文書ランクでは最大116%向上することを示した。
参考スコア（独自算出の注目度）: 18.48737466474846
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer-based document cross-encoder rerankers are a central component of modern information retrieval systems. Despite their success, these models suffer from high computational costs due to processing long query-document sequences at inference time. A known approach to improve efficiency is token compression, which consists of aggregating groups of tokens together in the initial embedding layer, reducing the effective number of tokens, and making the computation faster. While token compression has proven to be successful for bi-encoder retrievers, we empirically observed that this approach may be ineffective for cross-encoder rerankers. In this paper, we propose Layer-wise Token Compression (LTC), which applies adaptive token pooling at intermediate transformer layers. Through extensive ablation studies on MS MARCO passage and document ranking tasks, we demonstrate that compression at middle layers preserves ranking quality while increasing inference QPS by up to 25% for passage ranking and up to 116% for document ranking. We also extend LTC to listwise LLM rerankers and show that the same approach can be easily applied to long-context listwise reranking, where the QPS improvements are even greater. More surprisingly, when applying rerankers trained on short passages to long-document ranking tasks, models trained with compression outperform their uncompressed counterparts, suggesting that compression may act as a beneficial regularizer that encourages length-invariant representations.
Abstract（参考訳）: トランスフォーマーベースの文書クロスエンコーダ・リランカは、現代の情報検索システムの中心的なコンポーネントである。その成功にもかかわらず、これらのモデルは推論時に長いクエリドキュメントシーケンスを処理するため、高い計算コストに悩まされる。トークン圧縮は、初期埋め込み層にトークンのグループをまとめて構成し、トークンの有効個数を減らし、計算を高速化する。トークン圧縮はバイエンコーダレトリバーには有効であることが証明されているが、我々はこの手法がクロスエンコーダリランカには有効でないことを実証的に観察した。本稿では,中間変圧器層に適応トークンプーリングを適用する層ワイドトークン圧縮(LTC)を提案する。 MARCOパスと文書ランキングタスクに関する広範囲にわたるアブレーション研究を通じて、中間層での圧縮がランキング品質を保ちつつ、推論QPSを25%まで増加させ、文書ランキングを最大116%まで向上させることを示した。またLCCをリストワイズLLMリランカに拡張し、QPSの改善がさらに大きい長文リストワイズに同じアプローチを適用可能であることを示す。さらに驚くべきことに、長い文書のランクタスクに短いパスでトレーニングされたリランカーを適用する場合、圧縮で訓練されたモデルは圧縮されていないタスクよりも優れており、圧縮は長さ不変表現を促進する有益な正規化器として機能する可能性があることを示唆している。

論文の概要: Layer-wise Token Compression for Efficient Document Reranking

関連論文リスト