Related papers: Extending MovieLens-32M to Provide New Evaluation Objectives

Extending MovieLens-32M to Provide New Evaluation Objectives

URL: http://arxiv.org/abs/2504.01863v2
Date: Sat, 26 Apr 2025 15:12:03 GMT
Title: Extending MovieLens-32M to Provide New Evaluation Objectives
Authors: Mark D. Smucker, Houmaan Chamani,
Abstract summary: We offer an extension to the MovieLens-32M dataset that provides for new evaluation objectives.<n>Our primary objective is to predict the movies that a user would be interested in watching, i.e. predict their watchlist.<n>It appears that by asking users to assess their personal recommendations, we can alleviate the issue of popularity bias in the evaluation of top-n recommendation.
Score: 2.984929040246293
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline evaluation of recommender systems has traditionally treated the problem as a machine learning problem. In the classic case of recommending movies, where the user has provided explicit ratings of which movies they like and don't like, each user's ratings are split into test and train sets, and the evaluation task becomes to predict the held out test data using the training data. This machine learning style of evaluation makes the objective to recommend the movies that a user has watched and rated highly, which is not the same task as helping the user find movies that they would enjoy if they watched them. This mismatch in objective between evaluation and task is a compromise to avoid the cost of asking a user to evaluate recommendations by watching each movie. We offer an extension to the MovieLens-32M dataset that provides for new evaluation objectives. Our primary objective is to predict the movies that a user would be interested in watching, i.e. predict their watchlist. To construct this extension, we recruited MovieLens users, collected their profiles, made recommendations with a diverse set of algorithms, pooled the recommendations, and had the users assess the pools. This paper demonstrates the feasibility of using pooling to construct a test collection for recommender systems. Notably, we found that the traditional machine learning style of evaluation ranks the Popular algorithm, which recommends movies based on total number of ratings in the system, in the middle of the twenty-two recommendation runs we used to build the pools. In contrast, when we rank the runs by users' interest in watching movies, we find that recommending popular movies as a recommendation algorithm becomes one of the worst performing runs. It appears that by asking users to assess their personal recommendations, we can alleviate the issue of popularity bias in the evaluation of top-n recommendation.

Related papers

Recommendation Algorithms: A Comparative Study in Movie Domain [0.0]
A regression model was built using novel properties extracted from the dataset and used as features in the model.<n>An exploratory data analysis on the Netflix dataset was conducted to gain insights into user rating behaviour and movie characteristics.<n>In addition to a feature in the XGBoost regression algorithm, the K-Nearest Neighbors and MF algorithms from Python's Surprise library are used for recommendations.
arXiv Detail & Related papers (2026-02-27T16:01:10Z)
MTRec: Learning to Align with User Preferences via Mental Reward Models [60.321038000806176]
We propose MTRec, a sequential recommendation framework designed to align with real user preferences.<n>We introduce a mental reward model to quantify user satisfaction and propose a distributional inverse reinforcement learning approach to learn it.<n>Experiments show that MTRec brings significant improvements to a variety of recommendation models.
arXiv Detail & Related papers (2025-09-26T18:10:48Z)
Towards a Real-World Aligned Benchmark for Unlearning in Recommender Systems [49.766845975588275]
We propose a set of design desiderata and research questions to guide the development of a more realistic benchmark for unlearning in recommender systems.<n>We argue for an unlearning setup that reflects the sequential, time-sensitive nature of real-world deletion requests.<n>We present a preliminary experiment in a next-basket recommendation setting based on our proposed desiderata and find that unlearning also works for sequential recommendation models.
arXiv Detail & Related papers (2025-08-23T16:05:40Z)
Interactive Visualization Recommendation with Hier-SUCB [52.11209329270573]
We propose an interactive personalized visualization recommendation (PVisRec) system that learns on user feedback from previous interactions.<n>For more interactive and accurate recommendations, we propose Hier-SUCB, a contextual semi-bandit in the PVisRec setting.
arXiv Detail & Related papers (2025-02-05T17:14:45Z)
Can Large Language Models Understand Preferences in Personalized Recommendation? [32.2250928311146]
We introduce PerRecBench, disassociating evaluation from user rating bias and item quality.<n>We find that the LLM-based recommendation techniques that are generally good at rating prediction fail to identify users' favored and disfavored items when the user rating bias and item quality are eliminated.<n>Our findings reveal the superiority of pairwise and listwise ranking approaches over pointwise ranking, PerRecBench's low correlation with traditional regression metrics, the importance of user profiles, and the role of pretraining data distributions.
arXiv Detail & Related papers (2025-01-23T05:24:18Z)
Monolithic Hybrid Recommender System for Suggesting Relevant Movies [0.0]
We consider two approaches of collaborative filtering, by using sequences of watched movies and considering the related movies rating.<n>Various weights would be set based on use cases.<n>Discussion was made regarding the literature and methodological approach to solve the problem.
arXiv Detail & Related papers (2024-11-16T20:41:17Z)
Measuring Strategization in Recommendation: Users Adapt Their Behavior to Shape Future Content [66.71102704873185]
We test for user strategization by conducting a lab experiment and survey. We find strong evidence of strategization across outcome metrics, including participants' dwell time and use of "likes" Our findings suggest that platforms cannot ignore the effect of their algorithms on user behavior.
arXiv Detail & Related papers (2024-05-09T07:36:08Z)
Large Language Models as Conversational Movie Recommenders: A User Study [3.3636849604467]
Large language models (LLMs) offer strong recommendation explainability but lack overall personalization, diversity, and user trust. LLMs show a greater ability to recommend lesser-known or niche movies.
arXiv Detail & Related papers (2024-04-29T20:17:06Z)
Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback. The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied. We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z)
PORE: Provably Robust Recommender Systems against Data Poisoning Attacks [58.26750515059222]
We propose PORE, the first framework to build provably robust recommender systems. PORE can transform any existing recommender system to be provably robust against untargeted data poisoning attacks. We prove that PORE still recommends at least $r$ of the $N$ items to the user under any data poisoning attack, where $r$ is a function of the number of fake users in the attack.
arXiv Detail & Related papers (2023-03-26T01:38:11Z)
Correcting the User Feedback-Loop Bias for Recommendation Systems [34.44834423714441]
We propose a systematic and dynamic way to correct user feedback-loop bias in recommendation systems. Our method includes a deep-learning component to learn each user's dynamic rating history embedding. We empirically validated the existence of such user feedback-loop bias in real world recommendation systems.
arXiv Detail & Related papers (2021-09-13T15:02:55Z)
PURS: Personalized Unexpected Recommender System for Improving User Satisfaction [76.98616102965023]
We describe a novel Personalized Unexpected Recommender System (PURS) model that incorporates unexpectedness into the recommendation process. Extensive offline experiments on three real-world datasets illustrate that the proposed PURS model significantly outperforms the state-of-the-art baseline approaches.
arXiv Detail & Related papers (2021-06-05T01:33:21Z)
Measuring Recommender System Effects with Simulated Users [19.09065424910035]
Popularity bias and filter bubbles are two of the most well-studied recommender system biases. We offer a simulation framework for measuring the impact of a recommender system under different types of user behavior.
arXiv Detail & Related papers (2021-01-12T14:51:11Z)
Data Poisoning Attacks to Deep Learning Based Recommender Systems [26.743631067729677]
We conduct first systematic study of data poisoning attacks against deep learning based recommender systems. An attacker's goal is to manipulate a recommender system such that the attacker-chosen target items are recommended to many users. To achieve this goal, our attack injects fake users with carefully crafted ratings to a recommender system.
arXiv Detail & Related papers (2021-01-07T17:32:56Z)
Automating App Review Response Generation [67.58267006314415]
We propose a novel approach RRGen that automatically generates review responses by learning knowledge relations between reviews and their responses. Experiments on 58 apps and 309,246 review-response pairs highlight that RRGen outperforms the baselines by at least 67.4% in terms of BLEU-4.
arXiv Detail & Related papers (2020-02-10T05:23:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.