Fugu-MT 論文翻訳(概要): HARPO: Hierarchical Agentic Reasoning for User-Aligned Conversational Recommendation

論文の概要: HARPO: Hierarchical Agentic Reasoning for User-Aligned Conversational Recommendation

arxiv url: http://arxiv.org/abs/2604.10048v1
Date: Sat, 11 Apr 2026 06:07:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:15.812007
Title: HARPO: Hierarchical Agentic Reasoning for User-Aligned Conversational Recommendation
Title（参考訳）: HARPO: ユーザ指向会話レコメンデーションのための階層的エージェント推論
Authors: Subham Raj, Aman Vaibhav Jha, Mayank Anand, Sriparna Saha,
Abstract要約: 本稿では,会話の推薦を構造化された意思決定プロセスとして再編成するエージェントフレームワークであるHARPOを提案する。 HarPOは階層的な選好学習を統合し、推奨品質を解釈可能な次元に分解する。推奨中心のメトリクスに対して、強いベースラインよりも一貫した改善を示します。
参考スコア（独自算出の注目度）: 10.766058469348382
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Conversational recommender systems (CRSs) operate under incremental preference revelation, requiring systems to make recommendation decisions under uncertainty. While recent approaches particularly those built on large language models achieve strong performance on standard proxy metrics such as Recall@K and BLEU, they often fail to deliver high-quality, user-aligned recommendations in practice. This gap arises because existing methods primarily optimize for intermediate objectives like retrieval accuracy, fluent generation, or tool invocation, rather than recommendation quality itself. We propose HARPO (Hierarchical Agentic Reasoning with Preference Optimization), an agentic framework that reframes conversational recommendation as a structured decision-making process explicitly optimized for multi-dimensional recommendation quality. HARPO integrates hierarchical preference learning that decomposes recommendation quality into interpretable dimensions (relevance, diversity, predicted user satisfaction, and engagement) and learns context-dependent weights over these dimensions; (ii) deliberative tree-search reasoning guided by a learned value network that evaluates candidate reasoning paths based on predicted recommendation quality rather than task completion; and (iii) domain-agnostic reasoning abstractions through Virtual Tool Operations and multi-agent refinement, enabling transferable recommendation reasoning across domains. We evaluate HARPO on ReDial, INSPIRED, and MUSE, demonstrating consistent improvements over strong baselines on recommendation-centric metrics while maintaining competitive response quality. These results highlight the importance of explicit, user-aligned quality optimization for conversational recommendation.
Abstract（参考訳）: 会話レコメンデータシステム(CRS)は、不確実性の下でレコメンデーション決定を行う必要がある。 Recall@KやBLEUといった一般的なプロキシメトリクスでは,特に大規模な言語モデル上に構築された最近のアプローチは高いパフォーマンスを実現していますが,実際には高品質でユーザ整合性のあるレコメンデーションの提供に失敗することが多いのです。このギャップは、既存のメソッドがリコメンデーション品質自体よりも、検索精度、流動性生成、ツール呼び出しといった中間的な目的に最適化されているため生じます。 HARPO(Hierarchical Agentic Reasoning with Preference Optimization)は,多次元レコメンデーション品質に最適化された構造化決定プロセスとして,対話的レコメンデーションを再構成するエージェントフレームワークである。 HARPOは、推奨品質を解釈可能な次元(関連性、多様性、ユーザの満足度、エンゲージメント)に分解する階層的な選好学習を統合し、これらの次元に関する文脈依存の重みを学習する。二課題完了よりも予測された推薦品質に基づいて候補推論経路を評価する学習価値ネットワークにより導かれる熟考的ツリー探索推論 3) Virtual Tool Operations と Multi-agent refinement によるドメインに依存しない推論抽象化により、ドメイン間での転送可能なレコメンデーション推論を可能にします。我々は、ReDial、INSPIRED、MUSEのHARPOを評価し、競争力のある応答品質を維持しながら、推奨中心のメトリクスの強いベースラインよりも一貫した改善を示す。これらの結果から,対話型レコメンデーションのためのユーザ指向品質最適化の重要性が浮き彫りになった。

論文の概要: HARPO: Hierarchical Agentic Reasoning for User-Aligned Conversational Recommendation

関連論文リスト