Fugu-MT 論文翻訳(概要): ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

論文の概要: ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

arxiv url: http://arxiv.org/abs/2508.07050v1
Date: Sat, 09 Aug 2025 17:26:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:28.675387
Title: ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability
Title（参考訳）: ReasonRank: 強い推論能力を備えたパスランクの強化
Authors: Wenhan Liu, Xinyu Ma, Weiwei Sun, Yutao Zhu, Yuchen Li, Dawei Yin, Zhicheng Dou,
Abstract要約: 本稿では,自動推論集約型トレーニングデータ合成フレームワークを提案する。自己整合性データフィルタリング機構は、データ品質を保証するために設計されている。トレーニングされた推論集約型リランカ textbfReasonRank は,BRIGHT のリーダボード上での最先端 (SOTA) のパフォーマンス40.6 を達成する。
参考スコア（独自算出の注目度）: 41.99845885135309
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Model (LLM) based listwise ranking has shown superior performance in many passage ranking tasks. With the development of Large Reasoning Models, many studies have demonstrated that step-by-step reasoning during test-time helps improve listwise ranking performance. However, due to the scarcity of reasoning-intensive training data, existing rerankers perform poorly in many complex ranking scenarios and the ranking ability of reasoning-intensive rerankers remains largely underdeveloped. In this paper, we first propose an automated reasoning-intensive training data synthesis framework, which sources training queries and passages from diverse domains and applies DeepSeek-R1 to generate high-quality training labels. A self-consistency data filtering mechanism is designed to ensure the data quality. To empower the listwise reranker with strong reasoning ability, we further propose a two-stage post-training approach, which includes a cold-start supervised fine-tuning (SFT) stage for reasoning pattern learning and a reinforcement learning (RL) stage for further ranking ability enhancement. During the RL stage, based on the nature of listwise ranking, we design a multi-view ranking reward, which is more effective than a ranking metric-based reward. Extensive experiments demonstrate that our trained reasoning-intensive reranker \textbf{ReasonRank} outperforms existing baselines significantly and also achieves much lower latency than pointwise reranker Rank1. \textbf{Through further experiments, our ReasonRank has achieved state-of-the-art (SOTA) performance 40.6 on the BRIGHT leaderboard\footnote{https://brightbenchmark.github.io/}.} Our codes are available at https://github.com/8421BCD/ReasonRank.
Abstract（参考訳）: 大規模言語モデル(LLM)に基づくリストワイドランキングは,多くの項目ランキングタスクにおいて優れたパフォーマンスを示している。大規模推論モデルの開発により、テスト時間中のステップバイステップ推論がリストワイドランキングのパフォーマンス向上に役立つことが多くの研究で実証されている。しかし、推論集約的なトレーニングデータが不足しているため、既存のリランカーは多くの複雑なランキングシナリオでは不十分であり、推論集約的なリランカーのランキング能力は未発達のままである。本稿では、まず、多様なドメインからのクエリやパスをトレーニングする自動推論集約型トレーニングデータ合成フレームワークを提案し、高品質なトレーニングラベルを生成するためにDeepSeek-R1を適用した。自己整合性データフィルタリング機構は、データ品質を保証するために設計されている。リストワイズ・リランカを強力な推論能力で強化するために、推論パターン学習のための冷間開始教師付き微調整(SFT)段階と、さらなるランキング能力向上のための強化学習(RL)段階を含む2段階のポストトレーニング手法を提案する。 RLの段階では、リストワイドランキングの性質に基づいて、ランク付け基準に基づく報酬よりも効果的であるマルチビューランキング報酬を設計する。大規模な実験では、トレーニングされた推論集約型リランカ \textbf{ReasonRank} が既存のベースラインを著しく上回り、ポイントワイズリランカ Rank1 よりもはるかに低レイテンシを実現している。さらなる実験として、我々のReasonRankはBRIGHT Leaderboard\footnote{https://brightbenchmark.github.io/}で最先端(SOTA)のパフォーマンスを40.6で達成しました。私たちのコードはhttps://github.com/8421BCD/ReasonRank.comで公開されています。

論文の概要: ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

関連論文リスト