Fugu-MT 論文翻訳(概要): Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com

論文の概要: Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com

arxiv url: http://arxiv.org/abs/2605.00353v1
Date: Fri, 01 May 2026 02:32:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 17:43:28.824036
Title: Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com
Title（参考訳）: IKEA.comにおけるDense Retrievalにおけるコントラスト学習のための負のデータマイニング
Authors: Eva Agapaki, Amritpal Singh Gill,
Abstract要約: 本稿では,構造的負のサンプリング戦略を用いて,IKEA製品検索の高密度検索を改善するための体系的アプローチを提案する。本手法は,カナダ市場でのオフライン実ユーザクエリ実験において,平均カテゴリ精度+2.6%を達成している。長テールクエリに対するA/Bテストでは、改善されたモデルとベースラインモデルの間のユーザエンゲージメントの指標に統計的に有意な差は見られなかった。
参考スコア（独自算出の注目度）: 0.5371337604556311
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contrastive learning is a core component of modern retrieval systems, but its effectiveness heavily relies on the quality of negative examples used during training. In this work, we present a systematic approach to improving dense retrieval for IKEA product search through structured negative sampling strategies and scalable LLM-as-a-judge relevance evaluation. Building on IKEA Search Engine's late-interaction retrieval architectures, we introduce two key contributions: (1) structured negative sampling strategies that leverage product hierarchical taxonomy and product attributes to generate semantically challenging negatives, and (2) a comprehensive LLM-based evaluation methodology for generating training data. Rather than relying on sparse human annotations or random sampling, our LLM-based evaluation system allocates a score for all candidate products against each query. Our methodology achieves +2.6\% average category accuracy on offline real user query experiments on the Canada market. However, our A/B test on long-tail queries showed no statistically significant differences in user engagement metrics between the improved and baseline models ($p > 0.05$). We trace this gap to user search behavior: 67\% of popular searches exhibit zero-click rates above 50\%, indicating that a substantial proportion of search sessions result in no product engagement regardless of result ranking. These findings underscore the importance of hard negative mining but also the need for grounding training data and offline evals in real user search behavior -- including query intent distribution and zero-click patterns -- to bridge the gap between offline retrieval quality and online user engagement.
Abstract（参考訳）: コントラスト学習は現代の検索システムの中核的な要素であるが、その有効性は訓練中に使われるネガティブな例の品質に大きく依存している。本研究では,構造的負のサンプリング戦略とスケーラブルなLCM-as-a-judge関連性評価を通じて,IKEA製品検索の高密度検索を改善するための体系的アプローチを提案する。 IKEA検索エンジンの遅延相互作用検索アーキテクチャを基盤として,(1)製品階層的分類と製品属性を利用して意味論的に難解なネガを生成する構造的ネガティブサンプリング戦略,(2)学習データを生成するための総合的LCMに基づく評価手法を紹介する。人間のアノテーションやランダムサンプリングに頼らず,LLMに基づく評価システムでは,各クエリに対して,候補商品のスコアを割り当てる。本手法は,カナダ市場でのオフライン実ユーザクエリ実験において,平均カテゴリ精度を+2.6%向上させる。しかし、長テールクエリに対するA/Bテストでは、改善されたモデルとベースラインモデルのユーザエンゲージメントの指標に統計的に有意な差はなかった(p > 0.05$)。人気検索の67\%は、50\%以上のゼロクリック率を示しており、検索結果のランキングに関わらず、検索セッションのかなりの割合は製品エンゲージメントを伴わないことを示している。これらの調査結果は、オフライン検索品質とオンラインユーザエンゲージメントのギャップを埋めるため、厳しい負のマイニングの重要性を浮き彫りにしただけでなく、実際のユーザ検索行動(クエリインテントの分散やゼロクリックパターンなど)におけるトレーニングデータとオフラインのevalの基盤化の必要性も浮き彫りにしている。

論文の概要: Negative Data Mining for Contrastive Learning in Dense Retrieval at IKEA.com

関連論文リスト