Fugu-MT 論文翻訳(概要): Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval

論文の概要: Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval

arxiv url: http://arxiv.org/abs/2604.23734v1
Date: Sun, 26 Apr 2026 14:28:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.528726
Title: Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval
Title（参考訳）: Prism-Reranker:Beyond Relevance Scoring -- エージェント検索におけるコントリビューションとエビデンスの共同生成
Authors: Dun Zhang,
Abstract要約: Qwen3.5上に4つのサイズ(0.8B, 2B, 4B, 9B)で構築されたリランカーモデルのファミリーであるPrism-Rerankerを紹介する。標準のye/no関連判断に加えて、評決がYesであるたびに、モデルはドキュメントがクエリをどのように助けるかを要約したコントリビューションステートメントを発行する。同じレシピが既存のLCMベースのリランカーを拡張し、Qwen3-Reranker-4Bのコントリビューションとエビデンス機能を強化している。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Modern retrieval pipelines increasingly serve downstream consumers like retrieval-augmented generation (RAG) and autonomous agents that need more than a scalar relevance score. A reranker that only tells the caller "how relevant" forces the agent to dump entire documents into the language-model context, wasting tokens on tangential passages and boilerplate. We introduce Prism-Reranker, a family of reranker models built on Qwen3.5 at four sizes (0.8B, 2B, 4B, 9B) that goes beyond scalar scoring. In addition to the standard yes/no relevance judgement, whenever the verdict is yes the model emits (i) a contribution statement summarizing how the document helps the query, and (ii) an evidence passage: a self-contained rewrite that preserves every query-relevant signal while discarding noise. Prism-Reranker is trained with a hybrid objective combining point-wise distillation from a strong commercial reranker API with supervised fine-tuning on contribution and evidence targets. We curate training data from KaLM-Embedding's open-source aggregation, augmented with real web documents retrieved via commercial search APIs for open-domain queries and LLM-synthesized variants, and rewrite a portion of queries into keyword-style reformulations to adapt the model to agent-issued traffic. To reconcile inconsistent labels across open corpora and obtain crisp binary supervision, we relabel data with an LLM-as-Judge ensemble aggregating votes from five frontier LLMs. On a QA subset of BEIR and on an LLM-judged evaluation of contribution and evidence quality, Prism-Reranker attains solid results across all four sizes. We further show that the same recipe extends existing LLM-based rerankers, augmenting Qwen3-Reranker-4B with contribution and evidence capabilities while improving its average BEIR-QA NDCG@10 by +1.54 over the base model. Model weights, training recipe, and evaluation suite are released.
Abstract（参考訳）: 現代の検索パイプラインは、検索強化世代(RAG)やスカラー関連スコア以上の自律エージェントのような下流の消費者に、ますます役立っている。呼び出し元に「いかに関係があるか」だけを伝えるリランカは、エージェントにすべての文書を言語モデルコンテキストにダンプさせ、接尾辞やボイラープレートにトークンを無駄にする。 Prism-RerankerはQwen3.5上に4つのサイズ(0.8B, 2B, 4B, 9B)で構築されたスカラースコアを超えるリランカーモデルである。標準のye/no関連判断に加えて、評決がYesであるたびに、モデルは出力する。 i) 文書がクエリをどのように助けるかを要約したコントリビューション文及び (ii)エビデンス・パス:ノイズを取り除きながら全てのクエリ関連信号を保存する自己完結型リライト。 Prism-Rerankerは、強力な商用リランカAPIからのポイントワイド蒸留と、コントリビューションとエビデンスターゲットの監督された微調整を組み合わせたハイブリッドな目標で訓練される。我々は、KaLM-Embeddingのオープンソースアグリゲーションからのトレーニングデータをキュレートし、オープンドメインクエリとLLM合成変種のための商用検索APIを介して検索された実際のWebドキュメントを付加し、クエリの一部をキーワードスタイルの書き換えに書き換え、エージェントが発行したトラフィックに適応させる。オープンコーパスをまたいで一貫性のないラベルを調整し、クリップなバイナリ管理を得るため、5つのフロンティア LLM から投票を集約する LLM-as-Judge アンサンブルを用いてデータをレバーリングする。 BEIRのQAサブセットとLLM-judgedによるコントリビューションとエビデンスの品質の評価では、Prism-Rerankerは4つのサイズでしっかりとした結果が得られる。さらに,このレシピは既存のLCMベースのリランカーを拡張し,Qwen3-Reranker-4Bのコントリビューションとエビデンス能力を高めつつ,平均BEIR-QA NDCG@10をベースモデルで1.54倍改善することを示した。モデルウェイト、トレーニングレシピ、評価スイートがリリースされている。

論文の概要: Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval

関連論文リスト