Fugu-MT 論文翻訳(概要): IRanker: Towards Ranking Foundation Model

論文の概要: IRanker: Towards Ranking Foundation Model

arxiv url: http://arxiv.org/abs/2506.21638v1
Date: Wed, 25 Jun 2025 17:56:06 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-30 21:12:22.948412
Title: IRanker: Towards Ranking Foundation Model
Title（参考訳）: IRanker: ランキングファウンデーションモデルを目指して
Authors: Tao Feng, Zhigang Hua, Zijie Lei, Yan Xie, Shuang Yang, Bo Long, Jiaxuan You,
Abstract要約: 我々は、単一のランキング基盤モデル(FM)を用いてランキングタスクを統合することを提案する。 IRankerは強化学習(RL)と反復デコーディングを備えたランキングフレームワークである。一つのIRanker-3Bが複数のデータセットに対して最先端の結果を得ることを示す。
参考スコア（独自算出の注目度）: 26.71771958251611
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ranking tasks are ubiquitous, encompassing applications such as recommendation systems, LLM routing, and item re-ranking. We propose to unify these tasks using a single ranking foundation model (FM), as it eliminates the need for designing different models for each specific ranking task. However, unlike general supervision tasks in LLMs, ranking tasks do not have clear labels for supervision, posing great challenges to developing a ranking FM. To overcome these challenges, we propose IRanker, a ranking FM framework with reinforcement learning (RL) and iterative decoding. Our insight is to decompose the complex ranking task into an iterative decoding process that eliminates the worst candidate from the candidate pool step by step, which significantly reduces the output combinatorial space and better utilizes the limited context length during RL training. We meticulously train and comprehensively evaluate an IRanker-3B model on nine datasets across three scenarios: recommendation, routing, and passage ranking. The results show that a single IRanker-3B achieves state-of-the-art results on several datasets compared to models of similar size, and even surpasses the performance of larger models on certain datasets. We further demonstrate the effectiveness of our RL design and the robustness of the iterative mechanism across different LLM sizes. Moreover, we conducted both in-domain and out-of-domain zero-shot generalization experiments, which showed that IRanker-3B achieved good generalization on in-domain ranking tasks compared to the base LLM by at least 5% improvement. Surprisingly, on out-of-domain generic LLM tasks, IRanker-3B outperformed the base model by at least 9% on GSM8K, IFEval, and MathQA. In addition, the thoughts generated by IRanker-3B during training could further enhance zero-shot LLM performance.
Abstract（参考訳）: ランク付けタスクはユビキタスで、レコメンデーションシステム、LLMルーティング、アイテムの再ランク付けなどのアプリケーションを含む。本研究では,各ランク付けタスクごとに異なるモデルを設計する必要がなくなるため,単一のランク付け基盤モデル(FM)を用いてこれらのタスクを統合することを提案する。しかし、LLMの一般的な監督タスクとは異なり、ランク付けタスクには監督のための明確なラベルがなく、ランク付けされたFMを開発する上で大きな課題となっている。これらの課題を克服するために、強化学習(RL)と反復復号を伴うランキングFMフレームワークIRankerを提案する。我々の洞察は、複雑なランキングタスクを反復的復号プロセスに分解し、候補プールステップから最悪の候補を段階的に排除し、出力組合せ空間を著しく削減し、RLトレーニング中の限られた文脈長をよりよく活用することである。我々は、IRanker-3Bモデルを3つのシナリオ(レコメンデーション、ルーティング、通過ランキング)にわたる9つのデータセット上で慎重に訓練し、包括的に評価する。その結果、単一のIRanker-3Bは、類似サイズのモデルと比較して、いくつかのデータセットで最先端の結果を達成し、特定のデータセットでより大きなモデルのパフォーマンスを超越していることがわかった。さらに、RL設計の有効性と、異なるLLMサイズにまたがる反復機構の堅牢性を示す。さらに、ドメイン内およびドメイン外ゼロショットの一般化実験を行い、IRanker-3Bは、ベースLLMと比較して少なくとも5%改善した。驚くべきことに、ドメイン外ジェネリックLLMタスクでは、IRanker-3BはGSM8K、IFEval、MathQAで少なくとも9%パフォーマンスが向上した。さらに、IRanker-3Bが訓練中に生み出した思想は、ゼロショットLLMの性能をさらに向上させる可能性がある。

論文の概要: IRanker: Towards Ranking Foundation Model

関連論文リスト