Fugu-MT 論文翻訳(概要): Aligning Large Language Models with Searcher Preferences

論文の概要: Aligning Large Language Models with Searcher Preferences

arxiv url: http://arxiv.org/abs/2603.10473v1
Date: Wed, 11 Mar 2026 06:44:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.815097
Title: Aligning Large Language Models with Searcher Preferences
Title（参考訳）: 探索者選好による大規模言語モデルの調整
Authors: Wei Wu, Peilun Zhou, Liyi Chen, Qimeng Wang, Chengqiang Lu, Yan Gao, Yi Wu, Yao Hu, Hui Xiong,
Abstract要約: オープンな生成検索のための最初の大規模言語モデル(LLM)であるSearchLLMを紹介する。ボトムライン制約を分離する階層型多次元報酬システムの設計を行う。オフライン評価とオンラインA/Bテストでは、生成品質とユーザエンゲージメントが改善されている。
参考スコア（独自算出の注目度）: 26.974618053554394
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The paradigm shift from item-centric ranking to answer-centric synthesis is redefining the role of search engines. While recent industrial progress has applied generative techniques to closed-set item ranking in e-commerce, research and deployment of open-ended generative search on large content platforms remain limited. This setting introduces challenges, including robustness to noisy retrieval, non-negotiable safety guarantees, and alignment with diverse user needs. In this work, we introduce SearchLLM, the first large language model (LLM) for open-ended generative search. We design a hierarchical, multi-dimensional reward system that separates bottom-line constraints, including factual grounding, basic answer quality and format compliance, from behavior optimization objectives that promote robustness to noisy retrieval and alignment with user needs. Concretely, our reward model evaluates responses conditioned on the user query, session history, and retrieved evidence set, combining rule-based checks with human-calibrated LLM judges to produce an interpretable score vector over these dimensions. We introduce a Gated Aggregation Strategy to derive the training reward for optimizing SearchLLM with Group Relative Policy Optimization (GRPO). We deploy SearchLLM in the AI search entry of RedNote. Offline evaluations and online A/B tests show improved generation quality and user engagement, increasing Valid Consumption Rate by 1.03% and reducing Re-search Rate by 2.81%, while upholding strict safety and reliability standards.
Abstract（参考訳）: アイテム中心のランキングから回答中心の合成へのパラダイムシフトは、検索エンジンの役割を再定義している。近年の産業進歩は、電子商取引におけるクローズド・セット・アイテムランキングに生成技術を適用しているが、大規模コンテンツプラットフォームにおけるオープン・エンド・ジェネレーション・サーチの研究と展開は依然として限られている。この設定では、ノイズの多い検索に対する堅牢性、非交渉可能な安全保証、多様なユーザニーズとの整合性といった課題が導入されている。本研究では,オープン・エンド・ジェネレーティブ・サーチのための最初の大規模言語モデルであるSearchLLMを紹介する。本研究では,現実の根拠付け,基本応答品質,フォーマットコンプライアンスなどのボトムライン制約を,ノイズの多い検索とユーザニーズの整合性を促進する行動最適化目標から分離する階層型多次元報酬システムの設計を行う。具体的には、ユーザクエリ、セッション履歴、検索されたエビデンスセットに基づいて、ルールベースのチェックと人間の校正されたLCMの判断を組み合わせることで、これらの次元の解釈可能なスコアベクトルを生成する。本稿では,グループ相対政策最適化(GRPO)を用いて検索LLMを最適化するためのトレーニング報酬を導出するために,Gated Aggregation Strategyを導入する。我々はRedNoteのAI検索エントリにSearchLLMをデプロイする。オフライン評価とオンラインA/Bテストは、生成品質とユーザエンゲージメントを改善し、有効消費率を1.03%引き上げ、検索率を2.81%削減し、厳格な安全性と信頼性基準を維持している。

論文の概要: Aligning Large Language Models with Searcher Preferences

関連論文リスト