Fugu-MT 論文翻訳(概要): Scalable In-context Ranking with Generative Models

論文の概要: Scalable In-context Ranking with Generative Models

arxiv url: http://arxiv.org/abs/2510.05396v1
Date: Mon, 06 Oct 2025 21:41:58 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-08 17:57:08.000008
Title: Scalable In-context Ranking with Generative Models
Title（参考訳）: 生成モデルによるスケーラブルなインコンテキストランク付け
Authors: Nilesh Gupta, Chong You, Srinadh Bhojanapalli, Sanjiv Kumar, Inderjit Dhillon, Felix Yu,
Abstract要約: In-context Ranking (ICR)は情報検索(IR)の新たなパラダイムである観測された文書間ブロック間隔をアーキテクチャ的に拡張することで,LLMの注意操作に適応する新しい方法であるBlockRankを紹介する。 BEIR、MSMarco、NQをMistral-7Bで実験したところ、FLARE Mistralは既存のSOTAリストワイドローダにマッチするか、上回っていることがわかった。
参考スコア（独自算出の注目度）: 38.41016998260796
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-context Ranking (ICR) is an emerging paradigm for Information Retrieval (IR), which leverages contextual understanding of LLMs by directly incorporating the task description, candidate documents, and the query into the model's input prompt and tasking the LLM to identify relevant document(s). While it is effective, efficiency is a significant challenge in this paradigm, especially as the candidate list grows due to quadratic/super-linear scaling of attention operation with context length. To this end, this paper first identifies inherent and exploitable structures in the attention of LLMs finetuned for ICR: (1) inter-document block sparsity: attention is dense within each document block but sparse across different documents in the context; and (2) query-document block relevance: the attention scores from certain query tokens to a document block in middle layers strongly correlate with that document's actual relevance. Motivated by these observations, we introduce BlockRank (Blockwise In-context Ranking), a novel method that adapts the attention operation in an LLM by (a) architecturally enforcing the observed inter-document block sparsity, reducing attention complexity from quadratic to linear without loss in performance, and (b) optimizing query-document block relevance for true relevant documents during fine-tuning using an auxiliary contrastive training objective, improving retrieval in attention. Experiments on BEIR, MSMarco and NQ with Mistral-7B demonstrate that FLARE Mistral matches or outperforms existing SOTA listwise rankers and controlled fine-tuned baseline while being significantly more efficient at inference (4.7x for 100 MSMarco documents in context) and scaling gracefully to long-context shortlists, around 500 documents in-context (approximately 100K context length) within a second, presenting a scalable and effective solution for ICR.
Abstract（参考訳）: In-context Ranking (ICR) は情報検索(IR)の新たなパラダイムであり、タスク記述、候補文書、クエリを直接モデルの入力プロンプトに組み込んで、関連する文書を識別する。有効ではあるが、特に、文脈長の注意操作の2次/超線形スケーリングにより、候補リストが大きくなるにつれて、このパラダイムでは効率が重要な課題である。本研究の目的は,(1)文書間ブロックの空間性:各ドキュメントブロック内では注目は密集しているが,コンテキスト内の異なるドキュメント間では疎通する;(2)クエリ文書ブロックの関連性:特定のクエリトークンから中間層におけるドキュメントブロックへの注意スコアは,そのドキュメントの実際の関連性に強く相関する。これらの観測から得られたBlockRank(Blockwise In-context Ranking)は,LLMにおける注意操作に適応する新しい手法である。 (a)観測された文書間ブロック間隔を建築的に実施し、性能を損なうことなく注意複雑性を2次から線形に低減し、 b) 補助的コントラスト訓練目標を用いて微調整中の真の関連文書に対するクエリ文書ブロック関連性を最適化し、注意の検索を改善する。 BEIR、MSMarco、NQのMistral-7Bによる実験では、FLARE Mistralは既存のSOTAリストワイドなランキングと制御されたベースラインにマッチし、推論(コンテキストにおける100のMSMarco文書の4.7倍)と長文のショートリストに優雅にスケールし、約500のドキュメント(約100Kコンテキスト長)を1秒以内で出力し、ICRのスケーラブルで効率的なソリューションを提示した。

論文の概要: Scalable In-context Ranking with Generative Models

関連論文リスト