Can Offline Metrics Measure Explanation Goals? A Comparative Survey Analysis of Offline Explanation Metrics in Recommender Systems
- URL: http://arxiv.org/abs/2310.14379v3
- Date: Mon, 14 Apr 2025 14:33:47 GMT
- Title: Can Offline Metrics Measure Explanation Goals? A Comparative Survey Analysis of Offline Explanation Metrics in Recommender Systems
- Authors: André Levi Zanon, Marcelo Garcia Manzato, Leonardo Rocha,
- Abstract summary: Explanations in a Recommender System (RS) provide reasons for recommendations to users and can enhance transparency, persuasiveness, engagement, and trust-known as explanation goals.<n>We investigated whether, in explanations connecting interacted and recommended items based on shared content, the selection of item attributes and interacted items affects explanation goals.<n>Metrics measuring the diversity and popularity of attributes and the recency of item interactions were used to evaluate explanations from three state-of-the-art algorithms across six recommendation systems.
- Score: 5.634769877793363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explanations in a Recommender System (RS) provide reasons for recommendations to users and can enhance transparency, persuasiveness, engagement, and trust-known as explanation goals. Evaluating the effectiveness of explanation algorithms offline remains challenging due to subjectivity. Initially, we conducted a literature review on current offline metrics, revealing that algorithms are often assessed with anecdotal evidence, offering convincing examples, or with metrics that don't align with human perception. We investigated whether, in explanations connecting interacted and recommended items based on shared content, the selection of item attributes and interacted items affects explanation goals. Metrics measuring the diversity and popularity of attributes and the recency of item interactions were used to evaluate explanations from three state-of-the-art agnostic algorithms across six recommendation systems. These offline metrics were compared with results from an online user study. Our findings reveal a trade-off: transparency and trust relate to popular properties, while engagement and persuasiveness are linked to diversified properties. This study contributes to the development of more robust evaluation methods for explanation algorithms in recommender systems.
Related papers
- Whom do Explanations Serve? A Systematic Literature Survey of User Characteristics in Explainable Recommender Systems Evaluation [7.021274080378664]
We surveyed 124 papers in which recommender systems explanations were evaluated in user studies.
Our findings suggest that the results from the surveyed studies predominantly cover specific users.
We recommend actions to move toward a more inclusive and reproducible evaluation.
arXiv Detail & Related papers (2024-12-12T13:01:30Z) - Quantifying User Coherence: A Unified Framework for Cross-Domain Recommendation Analysis [69.37718774071793]
This paper introduces novel information-theoretic measures for understanding recommender systems.
We evaluate 7 recommendation algorithms across 9 datasets, revealing the relationships between our measures and standard performance metrics.
arXiv Detail & Related papers (2024-10-03T13:02:07Z) - How to Surprisingly Consider Recommendations? A Knowledge-Graph-based Approach Relying on Complex Network Metrics [0.2537383030441368]
We propose a Knowledge Graph based recommender system by encoding user interactions on item catalogs.
Our study explores whether network-level metrics on KGs can influence the degree of surprise in recommendations.
We experimentally evaluate our approach on two datasets of LastFM listening histories and synthetic Netflix viewing profiles.
arXiv Detail & Related papers (2024-05-14T09:38:44Z) - Review-based Recommender Systems: A Survey of Approaches, Challenges and Future Perspectives [11.835903510784735]
Review-based recommender systems have emerged as a significant sub-field in this domain.
We present a categorization of these systems and summarize the state-of-the-art methods, analyzing their unique features, effectiveness, and limitations.
We propose potential directions for future research, including the integration of multimodal data, multi-criteria rating information, and ethical considerations.
arXiv Detail & Related papers (2024-05-09T05:45:18Z) - Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - Introducing User Feedback-based Counterfactual Explanations (UFCE) [49.1574468325115]
Counterfactual explanations (CEs) have emerged as a viable solution for generating comprehensible explanations in XAI.
UFCE allows for the inclusion of user constraints to determine the smallest modifications in the subset of actionable features.
UFCE outperforms two well-known CE methods in terms of textitproximity, textitsparsity, and textitfeasibility.
arXiv Detail & Related papers (2024-02-26T20:09:44Z) - Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language Models [53.337728969143086]
Recommendation systems harness user-item interactions like clicks and reviews to learn their representations.
Previous studies improve recommendation accuracy and interpretability by modeling user preferences across various aspects and intents.
We introduce a chain-based prompting approach to uncover semantic aspect-aware interactions.
arXiv Detail & Related papers (2023-12-26T15:44:09Z) - Interactive Explanation with Varying Level of Details in an Explainable
Scientific Literature Recommender System [0.5937476291232802]
We aim in this paper to adopt a user-centered, interactive explanation model that provides explanations with different levels of detail and empowers users to interact with, control, and personalize the explanations based on their needs and preferences.
We conducted a qualitative user study to investigate the impact of providing interactive explanations with varying level of details on the users' perception of the explainable RS.
arXiv Detail & Related papers (2023-06-09T10:48:04Z) - Explainable Recommender with Geometric Information Bottleneck [25.703872435370585]
We propose to incorporate a geometric prior learnt from user-item interactions into a variational network.
Latent factors from an individual user-item pair can be used for both recommendation and explanation generation.
Experimental results on three e-commerce datasets show that our model significantly improves the interpretability of a variational recommender.
arXiv Detail & Related papers (2023-05-09T10:38:36Z) - Improving Recommendation Relevance by simulating User Interest [77.34726150561087]
We observe that recommendation "recency" can be straightforwardly and transparently maintained by iterative reduction of ranks of inactive items.
The basic idea behind this work is patented in a context of online recommendation systems.
arXiv Detail & Related papers (2023-02-03T03:35:28Z) - Measuring "Why" in Recommender Systems: a Comprehensive Survey on the
Evaluation of Explainable Recommendation [87.82664566721917]
This survey is based on more than 100 papers from top-tier conferences like IJCAI, AAAI, TheWebConf, Recsys, UMAP, and IUI.
arXiv Detail & Related papers (2022-02-14T02:58:55Z) - From Intrinsic to Counterfactual: On the Explainability of
Contextualized Recommender Systems [43.93801836660617]
We show that by utilizing the contextual features (e.g., item reviews from users), we can design a series of explainable recommender systems.
We propose three types of explainable recommendation strategies with gradual change of model transparency: whitebox, graybox, and blackbox.
Our model achieves highly competitive ranking performance, and generates accurate and effective explanations in terms of numerous quantitative metrics and qualitative visualizations.
arXiv Detail & Related papers (2021-10-28T01:54:04Z) - SIFN: A Sentiment-aware Interactive Fusion Network for Review-based Item
Recommendation [48.1799451277808]
We propose a Sentiment-aware Interactive Fusion Network (SIFN) for review-based item recommendation.
We first encode user/item reviews via BERT and propose a light-weighted sentiment learner to extract semantic features of each review.
Then, we propose a sentiment prediction task that guides the sentiment learner to extract sentiment-aware features via explicit sentiment labels.
arXiv Detail & Related papers (2021-08-18T08:04:38Z) - REAM$\sharp$: An Enhancement Approach to Reference-based Evaluation
Metrics for Open-domain Dialog Generation [63.46331073232526]
We present an enhancement approach to Reference-based EvAluation Metrics for open-domain dialogue systems.
A prediction model is designed to estimate the reliability of the given reference set.
We show how its predicted results can be helpful to augment the reference set, and thus improve the reliability of the metric.
arXiv Detail & Related papers (2021-05-30T10:04:13Z) - Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments.
We observe that offline metrics are correlated with online performance over a range of environments.
We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.