Modelling the Recommender Alignment Problem
- URL: http://arxiv.org/abs/2208.12299v1
- Date: Thu, 25 Aug 2022 18:37:49 GMT
- Title: Modelling the Recommender Alignment Problem
- Authors: Francisco Carvalho
- Abstract summary: This work aims to shed light on how an end-to-end study of reward functions for recommender systems might be done.
We learn recommender policies that optimize reward functions by controlling graph dynamics on a toy environment.
Based on the effects that trained recommenders have on their environment, we conclude that engagement maximizers generally lead to worse outcomes than aligned recommenders but not always.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recommender systems (RS) mediate human experience online. Most RS act to
optimize metrics that are imperfectly aligned with the best-interest of users
but are easy to measure, like ad-clicks and user engagement. This has resulted
in a host of hard-to-measure side-effects: political polarization, addiction,
fake news. RS design faces a recommender alignment problem: that of aligning
recommendations with the goals of users, system designers, and society as a
whole. But how do we test and compare potential solutions to align RS? Their
massive scale makes them costly and risky to test in deployment. We synthesized
a simple abstract modelling framework to guide future work.
To illustrate it, we construct a toy experiment where we ask: "How can we
evaluate the consequences of using user retention as a reward function?" To
answer the question, we learn recommender policies that optimize reward
functions by controlling graph dynamics on a toy environment. Based on the
effects that trained recommenders have on their environment, we conclude that
engagement maximizers generally lead to worse outcomes than aligned
recommenders but not always. After learning, we examine competition between RS
as a potential solution to RS alignment. We find that it generally makes our
toy-society better-off than it would be under the absence of recommendation or
engagement maximizers.
In this work, we aimed for a broad scope, touching superficially on many
different points to shed light on how an end-to-end study of reward functions
for recommender systems might be done. Recommender alignment is a pressing and
important problem. Attempted solutions are sure to have far-reaching impacts.
Here, we take a first step in developing methods to evaluating and comparing
solutions with respect to their impacts on society.
Related papers
- Harm Mitigation in Recommender Systems under User Preference Dynamics [16.213153879446796]
We consider a recommender system that takes into account the interplay between recommendations, user interests, and harmful content.
We seek recommendation policies that establish a tradeoff between maximizing click-through rate (CTR) and mitigating harm.
arXiv Detail & Related papers (2024-06-14T09:52:47Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - RAH! RecSys-Assistant-Human: A Human-Centered Recommendation Framework
with LLM Agents [30.250555783628762]
This research argues that addressing these issues is not solely the recommender systems' responsibility.
We introduce the RAH Recommender system, Assistant, and Human framework, emphasizing the alignment with user personalities.
Our contributions provide a human-centered recommendation framework that partners effectively with various recommendation models.
arXiv Detail & Related papers (2023-08-19T04:46:01Z) - Breaking Feedback Loops in Recommender Systems with Causal Inference [99.22185950608838]
Recent work has shown that feedback loops may compromise recommendation quality and homogenize user behavior.
We propose the Causal Adjustment for Feedback Loops (CAFL), an algorithm that provably breaks feedback loops using causal inference.
We show that CAFL improves recommendation quality when compared to prior correction methods.
arXiv Detail & Related papers (2022-07-04T17:58:39Z) - Recommendation Systems with Distribution-Free Reliability Guarantees [83.80644194980042]
We show how to return a set of items rigorously guaranteed to contain mostly good items.
Our procedure endows any ranking model with rigorous finite-sample control of the false discovery rate.
We evaluate our methods on the Yahoo! Learning to Rank and MSMarco datasets.
arXiv Detail & Related papers (2022-07-04T17:49:25Z) - Meta Policy Learning for Cold-Start Conversational Recommendation [71.13044166814186]
We study CRS policy learning for cold-start users via meta reinforcement learning.
To facilitate policy adaptation, we design three synergetic components.
arXiv Detail & Related papers (2022-05-24T05:06:52Z) - ELIXIR: Learning from User Feedback on Explanations to Improve
Recommender Models [26.11434743591804]
We devise a human-in-the-loop framework, called ELIXIR, where user feedback on explanations is leveraged for pairwise learning of user preferences.
ELIXIR leverages feedback on pairs of recommendations and explanations to learn user-specific latent preference vectors.
Our framework is instantiated using generalized graph recommendation via Random Walk with Restart.
arXiv Detail & Related papers (2021-02-15T13:43:49Z) - Measuring Recommender System Effects with Simulated Users [19.09065424910035]
Popularity bias and filter bubbles are two of the most well-studied recommender system biases.
We offer a simulation framework for measuring the impact of a recommender system under different types of user behavior.
arXiv Detail & Related papers (2021-01-12T14:51:11Z) - Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments.
We observe that offline metrics are correlated with online performance over a range of environments.
We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z) - Optimizing Long-term Social Welfare in Recommender Systems: A
Constrained Matching Approach [36.54379845220444]
We study settings in which content providers cannot remain viable unless they receive a certain level of user engagement.
Our model ensures the system reaches an equilibrium with maximal social welfare supported by a sufficiently diverse set of viable providers.
We draw connections to various notions of user regret and fairness, arguing that these outcomes are fairer in a utilitarian sense.
arXiv Detail & Related papers (2020-07-31T22:40:47Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.