Fugu-MT 論文翻訳(概要): From Top-1 to Top-K: A Reproducibility Study and Benchmarking of Counterfactual Explanations for Recommender Systems

論文の概要: From Top-1 to Top-K: A Reproducibility Study and Benchmarking of Counterfactual Explanations for Recommender Systems

arxiv url: http://arxiv.org/abs/2604.19663v1
Date: Tue, 21 Apr 2026 16:48:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-22 22:41:49.879938
Title: From Top-1 to Top-K: A Reproducibility Study and Benchmarking of Counterfactual Explanations for Recommender Systems
Title（参考訳）: Top-1からTop-K:Reproducibility Study and Benchmarking of Counterfactual Explanations for Recommender Systems
Authors: Quang-Huy Nguyen, Thanh-Hai Nguyen, Khac-Manh Thai, Duc-Hoang Pham, Huy-Son Nguyen, Cam-Van Thi Nguyen, Masoud Mansoury, Duc-Trong Le, Hoang-Quynh Le,
Abstract要約: 対実的説明(CE)は、レコメンダシステムを理解する直感的な方法を提供する。既存のレコメンデータシステムのためのCEメソッドは、異なるデータセット、レコメンデータ、メトリクス、説明形式を用いて評価されている。本稿では,レコメンデータシステムのための11の最先端CE手法を体系的に再現し,再評価し,再実装する。
参考スコア（独自算出の注目度）: 3.5498952876443917
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Counterfactual explanations (CEs) provide an intuitive way to understand recommender systems by identifying minimal modifications to user-item interactions that alter recommendation outcomes. Existing CE methods for recommender systems, however, have been evaluated under heterogeneous protocols, using different datasets, recommenders, metrics, and even explanation formats, which hampers reproducibility and fair comparison. Our paper systematically reproduces, re-implement, and re-evaluate eleven state-of-the-art CE methods for recommender systems, covering both native explainers (e.g., LIME-RS, SHAP, PRINCE, ACCENT, LXR, GREASE) and specific graph-based explainers originally proposed for GNNs. Here, a unified benchmarking framework is proposed to assess explainers along three dimensions: explanation format (implicit vs. explicit), evaluation level (item-level vs. list-level), and perturbation scope (user interaction vectors vs. user-item interaction graphs). Our evaluation protocol includes effectiveness, sparsity, and computational complexity metrics, and extends existing item-level assessments to top-K list-level explanations. Through extensive experiments on three real-world datasets and six representative recommender models, we analyze how well previously reported strengths of CE methods generalize across diverse setups. We observe that the trade-off between effectiveness and sparsity depends strongly on the specific method and evaluation setting, particularly under the explicit format; in addition, explainer performance remains largely consistent across item level and list level evaluations, and several graph-based explainers exhibit notable scalability limitations on large recommender graphs. Our results refine and challenge earlier conclusions about the robustness and practicality of CE generation methods in recommender systems: https://github.com/L2R-UET/CFExpRec.
Abstract（参考訳）: 対実的説明(CE)は、リコメンデーション結果を変更するユーザとイテムのインタラクションに対する最小限の変更を識別することで、リコメンデーションシステムを理解するための直感的な方法を提供する。しかし、リコメンデータシステムのための既存のCEメソッドは、異なるデータセット、レコメンデータ、メトリクス、さらには説明形式を使って、異種プロトコルで評価され、再現性と公正な比較を妨げている。提案手法は,従来のGNNに提案されていたネイティブな説明文(例えば,LIME-RS,SHAP,PRINCE,ACCENT,LXR,GREASE)とグラフベースの説明文の両方を網羅し,提案手法を体系的に再現し,再実装し,再評価する。ここでは、説明形式(単純か明示か)、評価レベル(テムレベルかリストレベルか)、摂動範囲(ユーザインタラクションベクトルかユーザインタラクショングラフか)の3つの側面に沿って説明者を評価するために、統一的なベンチマークフレームワークを提案する。評価プロトコルには、有効性、疎度、計算複雑性のメトリクスが含まれており、既存の項目レベルの評価をトップKリストレベルの説明にまで拡張している。実世界の3つのデータセットと6つの代表的なレコメンデーションモデルに関する広範な実験を通じて、CE手法の強みが様々な設定でどのように一般化されているかを分析した。有効性と疎性の間のトレードオフは,特に明示的なフォーマット下では,特定の手法と評価設定に強く依存している。さらに,項目レベルとリストレベルの評価において,説明者のパフォーマンスは相変わらず一定であり,グラフベースの説明者の中には,大きな推奨グラフに顕著なスケーラビリティの限界を示す者もいる。本研究の結果は, 推薦システムにおけるCE生成手法の堅牢性と実用性について, 先程の結論に挑戦するものである。

論文の概要: From Top-1 to Top-K: A Reproducibility Study and Benchmarking of Counterfactual Explanations for Recommender Systems

関連論文リスト