Fugu-MT 論文翻訳(概要): Learning from Natural Language Feedback for Personalized Question Answering

論文の概要: Learning from Natural Language Feedback for Personalized Question Answering

arxiv url: http://arxiv.org/abs/2508.10695v1
Date: Thu, 14 Aug 2025 14:36:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-15 22:24:48.360481
Title: Learning from Natural Language Feedback for Personalized Question Answering
Title（参考訳）: パーソナライズされた質問応答のための自然言語フィードバックからの学習
Authors: Alireza Salemi, Hamed Zamani,
Abstract要約: パーソナライゼーションは、言語技術の有効性とユーザ満足度を高めるために不可欠である。大規模言語モデル(LLM)をパーソナライズするための現在のアプローチは、しばしば検索強化世代(RAG)に依存している。我々は、スカラー報酬を自然言語フィードバック(NLF)に置き換える、パーソナライズされた応答生成のための新しいフレームワークであるVacを紹介する。
参考スコア（独自算出の注目度）: 21.115495457454365
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Personalization is crucial for enhancing both the effectiveness and user satisfaction of language technologies, particularly in information-seeking tasks like question answering. Current approaches for personalizing large language models (LLMs) often rely on retrieval-augmented generation (RAG), followed by reinforcement learning with scalar reward signals to teach models how to use retrieved personal context. We believe that these scalar rewards sometimes provide weak, non-instructive feedback, limiting learning efficiency and personalization quality. We introduce VAC, a novel framework for personalized response generation that replaces scalar rewards with natural language feedback (NLF) that are generated conditioned on the user profiles and the question narratives. NLF serves as a rich and actionable supervision signal, allowing the policy model to iteratively refine its outputs and internalize effective personalization strategies. Training alternates between optimizing the feedback model and fine-tuning the policy model on the improved responses, resulting in a policy model that no longer requires feedback at inference. Evaluation on the LaMP-QA benchmark that consists of three diverse domains demonstrates consistent and significant improvements over the state-of-the-art results. Human evaluations further confirm the superior quality of the generated responses. These results demonstrate that NLF provides more effective signals for optimizing personalized question answering.
Abstract（参考訳）: パーソナライゼーションは、特に質問応答のような情報検索タスクにおいて、言語技術の有効性とユーザ満足度の向上に不可欠である。大規模言語モデル(LLM)をパーソナライズするための現在のアプローチは、しばしば検索強化世代(RAG)に依存し、続いて、検索された個人コンテキストの使い方を教えるためのスカラー報酬信号を用いた強化学習が続く。これらのスカラー報酬は、学習効率とパーソナライズ品質を制限し、弱く非インストラクティブなフィードバックをもたらすことがあると信じている。本稿では,スカラー報酬を自然言語フィードバック(NLF)に置き換えるパーソナライズされた応答生成のための新しいフレームワークであるVACを紹介する。 NLFはリッチで行動可能な監視信号として機能し、ポリシーモデルがその出力を反復的に洗練し、効果的なパーソナライズ戦略を内包することを可能にする。トレーニングは、フィードバックモデルを最適化することと、改善されたレスポンスに対するポリシーモデルを微調整することの間に交互に行われる。 3つの異なるドメインからなるLaMP-QAベンチマークの評価は、最先端の結果よりも一貫性と大幅な改善を示している。人間の評価は、生成した応答の優れた品質をさらに確認する。これらの結果は、NLFがパーソナライズされた質問応答を最適化するためにより効果的な信号を提供することを示す。

論文の概要: Learning from Natural Language Feedback for Personalized Question Answering

関連論文リスト