Fugu-MT 論文翻訳(概要): Rank, Don't Generate: Statement-level Ranking for Explainable Recommendation

論文の概要: Rank, Don't Generate: Statement-level Ranking for Explainable Recommendation

arxiv url: http://arxiv.org/abs/2604.03724v1
Date: Sat, 04 Apr 2026 13:01:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:18.739134
Title: Rank, Don't Generate: Statement-level Ranking for Explainable Recommendation
Title（参考訳）: ランク、生成しない:説明可能な勧告のための文書レベルのランク付け
Authors: Ben Kabongo, Arthur Satouf, Vincent Guigue,
Abstract要約: 説明可能なレコメンデーションをステートメントレベルのランキング問題として定式化する。この定式化は、構成による幻覚を緩和し、きめ細かい事実分析を可能にする。我々は、Amazon Reviews 2014の4つの製品カテゴリから構築された、説明可能なレコメンデーションにおけるステートメントランキングのベンチマークであるStaRを紹介した。
参考スコア（独自算出の注目度）: 2.3534886273639457
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Textual explanations, generated with large language models (LLMs), are increasingly used to justify recommendations. Yet, evaluating these explanations remains a critical challenge. We advocate a shift in objective: rank, don't generate. We formalize explainable recommendation as a statement-level ranking problem, where systems rank candidate explanatory statements derived from reviews and return the top-k as explanation. This formulation mitigates hallucination by construction and enables fine-grained factual analysis. It also models factor importance through relevance scores and supports standardized, reproducible evaluation with established ranking metrics. Meaningful assessment, however, requires each statement to be explanatory (item facts affecting user experience), atomic (one opinion about one aspect), and unique (paraphrases consolidated), which is challenging to obtain from noisy reviews. We address this with (i) an LLM-based extraction pipeline producing explanatory and atomic statements, and (ii) a scalable, semantic clustering method consolidating paraphrases to enforce uniqueness. Building on this pipeline, we introduce StaR, a benchmark for statement ranking in explainable recommendation, constructed from four Amazon Reviews 2014 product categories. We evaluate popularity-based baselines and state-of-the-art models under global-level (all statements) and item-level (target item statements) ranking. Popularity baselines are competitive in global-level ranking but outperform state-of-the-art models on average in item-level ranking, exposing critical limitations in personalized explanation ranking.
Abstract（参考訳）: 大きな言語モデル(LLM)で生成されたテキストの説明は、リコメンデーションを正当化するためにますます使われています。しかし、これらの説明を評価することは依然として重要な課題である。私たちは客観的な変化を提唱します。そこで,本論文では,評価基準から導かれる候補説明文をランク付けし,上位kを表示として返却する,ステートメントレベルのランク付け問題として説明可能なレコメンデーションを定式化する。この定式化は、構成による幻覚を緩和し、きめ細かい事実分析を可能にする。また、関連スコアを通じて重要な要素をモデル化し、確立されたランキングメトリクスで標準化された再現可能な評価をサポートする。しかし、意味のある評価は、各ステートメントが説明的(ユーザエクスペリエンスに影響を及ぼす事実)、アトミック(一つの側面に関する一つの意見)、ユニークな(フレーズの統合)であることが要求され、ノイズの多いレビューから入手することは困難である。私たちはこの問題に対処します一説明文及び原子文を生成するLLMに基づく抽出パイプライン及び (ii)拡張性のあるセマンティッククラスタリング手法で,一意性を強制するためにパラフレーズを集約する。このパイプラインに基づいて、Amazon Reviews 2014の4つの製品カテゴリから構築された、説明可能なレコメンデーションにおけるステートメントランキングのベンチマークであるStaRを紹介します。我々は、グローバルレベル(全文)とアイテムレベル(ターゲットアイテムステートメント)のランキングに基づいて、人気ベースのベースラインと最先端モデルを評価する。人気のベースラインは、世界レベルのランキングでは競争力があるが、アイテムレベルのランキングでは平均的な最先端モデルよりも優れており、パーソナライズされた説明ランキングでは重要な制限が露呈している。

論文の概要: Rank, Don't Generate: Statement-level Ranking for Explainable Recommendation

関連論文リスト