Fugu-MT 論文翻訳(概要): ARHN: Answer-Centric Relabeling of Hard Negatives with Open-Source LLMs for Dense Retrieval

論文の概要: ARHN: Answer-Centric Relabeling of Hard Negatives with Open-Source LLMs for Dense Retrieval

arxiv url: http://arxiv.org/abs/2604.11092v1
Date: Mon, 13 Apr 2026 07:11:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.385897
Title: ARHN: Answer-Centric Relabeling of Hard Negatives with Open-Source LLMs for Dense Retrieval
Title（参考訳）: ARHN:Dense RetrievalのためのオープンソースのLCMを用いたハードネガティティクスの回答中心のリラベル
Authors: Hyewon Choi, Jooyoung Choi, Hansol Jang, Hyun Kim, Chulmin Yun, ChangWook Jun, Stanley Jungkyu Choi,
Abstract要約: 我々は,答え中心の関連信号を用いて強陰性サンプルを精査するためのARHN(Answer-centric Relabeling of Hard Negatives)を提案する。 BEIRベンチマークのARHNを3つの構成で評価した。
参考スコア（独自算出の注目度）: 12.84859278829763
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural retrievers are often trained on large-scale triplet data comprising a query, a positive passage, and a set of hard negatives. In practice, hard-negative mining can introduce false negatives and other ambiguous negatives, including passages that are relevant or contain partial answers to the query. Such label noise yields inconsistent supervision and can degrade retrieval effectiveness. We propose ARHN (Answer-centric Relabeling of Hard Negatives), a two-stage framework that leverages open-source LLMs to refine hard negative samples using answer-centric relevance signals. In the first stage, for each query-passage pair, ARHN prompts the LLM to generate a passage-grounded answer snippet or to indicate that the passage does not support an answer. In the second stage, ARHN applies an LLM-based listwise ranking over the candidate set to order passages by direct answerability to the query. Passages ranked above the original positive are relabeled to additional positives. Among passages ranked below the positive, ARHN excludes any that contain an answer snippet from the negative set to avoid ambiguous supervision. We evaluated ARHN on the BEIR benchmark under three configurations: relabeling only, filtering only, and their combination. Across datasets, the combined strategy consistently improves over either step in isolation, indicating that jointly relabeling false negatives and filtering ambiguous negatives yields cleaner supervision for training neural retrieval models. By relying strictly on open-source models, ARHN establishes a cost-effective and scalable refinement pipeline suitable for large-scale training.
Abstract（参考訳）: ニューラルレトリバーは、クエリ、ポジティブパス、ハード負のセットからなる大規模なトリプルトデータに基づいて訓練されることが多い。実際には、厳しい負のマイニングは偽陰性やその他の曖昧な陰性を導入しうる。このようなラベルノイズは、一貫性のない監視をもたらし、検索効率を低下させることができる。本稿では,オープンソースのLCMを利用した2段階のフレームワークARHN(Answer-centric Relabeling of Hard Negatives)を提案する。第1段階では、各クエリパスペアに対して、ARHNはLSMにパスグラウンドの回答スニペットを生成するように促したり、そのパスが応答をサポートしないことを示す。第2段階では、ARHN は LLM をベースとしたリストワイドランキングを適用し、クエリへの直接応答性によってパスを順序付けする。元の正よりも上位のパッセージは、追加の正に対して許容される。肯定値より下位の節のうち、ARHNは、答えスニペットを含む全ての節を否定的な集合から除外し、曖昧な監督を避ける。 BEIRベンチマークのARHNを3つの構成で評価した。データセット全体にわたって、組み合わせた戦略は、いずれのステップよりも一貫して改善され、偽陰性を共同で許容し、あいまいな陰性をフィルタリングすることで、ニューラルネットワークモデルをトレーニングするためのよりクリーンな監視が得られます。オープンソースモデルに厳密に依存することで、ARHNは大規模トレーニングに適したコスト効率とスケーラブルな改善パイプラインを確立する。

論文の概要: ARHN: Answer-Centric Relabeling of Hard Negatives with Open-Source LLMs for Dense Retrieval

関連論文リスト