Fugu-MT 論文翻訳(概要): LLM Optimization Unlocks Real-Time Pairwise Reranking

論文の概要: LLM Optimization Unlocks Real-Time Pairwise Reranking

arxiv url: http://arxiv.org/abs/2511.07555v1
Date: Wed, 12 Nov 2025 01:03:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-12 20:17:03.381235
Title: LLM Optimization Unlocks Real-Time Pairwise Reranking
Title（参考訳）: LLM最適化がリアルタイムのペアリグレードをアンロック
Authors: Jingyu Wu, Aditya Shrivastava, Jing Zhu, Alfy Samuel, Anoop Kumar, Daben Liu,
Abstract要約: Pairwise Re rank Prompting (PRP) はそのユーザビリティと有効性から,将来性のあるプラグアンドプレイアプローチとして登場した。本稿では、ペアワイズ・リライジングに焦点を合わせ、厳密な最適化手法がこれらの問題を著しく軽減できることを実証する。 Recall@kで測定されたパフォーマンスは、61.36秒から0.37秒まで、最高166倍の遅延削減を実現しています。
参考スコア（独自算出の注目度）: 6.0141312590967635
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficiently reranking documents retrieved from information retrieval (IR) pipelines to enhance overall quality of Retrieval-Augmented Generation (RAG) system remains an important yet challenging problem. Recent studies have highlighted the importance of Large Language Models (LLMs) in reranking tasks. In particular, Pairwise Reranking Prompting (PRP) has emerged as a promising plug-and-play approach due to its usability and effectiveness. However, the inherent complexity of the algorithm, coupled with the high computational demands and latency incurred due to LLMs, raises concerns about its feasibility in real-time applications. To address these challenges, this paper presents a focused study on pairwise reranking, demonstrating that carefully applied optimization methods can significantly mitigate these issues. By implementing these methods, we achieve a remarkable latency reduction of up to 166 times, from 61.36 seconds to 0.37 seconds per query, with an insignificant drop in performance measured by Recall@k. Our study highlights the importance of design choices that were previously overlooked, such as using smaller models, limiting the reranked set, using lower precision, reducing positional bias with one-directional order inference, and restricting output tokens. These optimizations make LLM-based reranking substantially more efficient and feasible for latency-sensitive, real-world deployments.
Abstract（参考訳）: 情報検索(IR)パイプラインから取得した文書を効率よく更新し、検索・拡張生成(RAG)システム全体の品質を向上させることは、依然として重要な課題である。最近の研究は、タスクの再ランク付けにおけるLarge Language Models (LLMs)の重要性を強調している。特に、Pairwise Re rank Prompting (PRP)は、そのユーザビリティと有効性から、将来性のあるプラグアンドプレイアプローチとして登場した。しかし、アルゴリズムの本質的な複雑さと、LLMによる高い計算要求と遅延が組み合わさって、リアルタイムアプリケーションにおけるその実現可能性への懸念が高まる。これらの課題に対処するため,本論文では,ペアワイド・リグレード(ペアワイド・リグレード)に焦点をあて,慎重に適用した最適化手法がこれらの問題を著しく軽減できることを実証する。これらの手法を実装することで、クエリあたり61.36秒から0.37秒までの166倍のレイテンシ削減を実現します。本研究は, 従来見過ごされていた設計選択の重要性を強調した。例えば, より小さなモデルの使用, 再帰集合の制限, 低い精度の使用, 一方向の順序推論による位置偏差の低減, 出力トークンの制限などである。これらの最適化により、LLMベースのリランクは、レイテンシに敏感で現実的なデプロイメントにおいて、大幅に効率が良く、実現可能である。

論文の概要: LLM Optimization Unlocks Real-Time Pairwise Reranking

関連論文リスト