Reward Shaping for User Satisfaction in a REINFORCE Recommender
- URL: http://arxiv.org/abs/2209.15166v1
- Date: Fri, 30 Sep 2022 01:29:12 GMT
- Title: Reward Shaping for User Satisfaction in a REINFORCE Recommender
- Authors: Konstantina Christakopoulou, Can Xu, Sai Zhang, Sriraj Badam, Trevor
Potter, Daniel Li, Hao Wan, Xinyang Yi, Ya Le, Chris Berg, Eric Bencomo
Dixon, Ed H. Chi, Minmin Chen
- Abstract summary: We propose a policy network and a satisfaction imputation network to learn which actions are satisfying to the user.
The role of the imputation network is to learn which actions are satisfying to the user; while the policy network, built on top of REINFORCE, decides which items to recommend.
- Score: 24.65853598093849
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: How might we design Reinforcement Learning (RL)-based recommenders that
encourage aligning user trajectories with the underlying user satisfaction?
Three research questions are key: (1) measuring user satisfaction, (2)
combatting sparsity of satisfaction signals, and (3) adapting the training of
the recommender agent to maximize satisfaction. For measurement, it has been
found that surveys explicitly asking users to rate their experience with
consumed items can provide valuable orthogonal information to the
engagement/interaction data, acting as a proxy to the underlying user
satisfaction. For sparsity, i.e, only being able to observe how satisfied users
are with a tiny fraction of user-item interactions, imputation models can be
useful in predicting satisfaction level for all items users have consumed. For
learning satisfying recommender policies, we postulate that reward shaping in
RL recommender agents is powerful for driving satisfying user experiences.
Putting everything together, we propose to jointly learn a policy network and a
satisfaction imputation network: The role of the imputation network is to learn
which actions are satisfying to the user; while the policy network, built on
top of REINFORCE, decides which items to recommend, with the reward utilizing
the imputed satisfaction. We use both offline analysis and live experiments in
an industrial large-scale recommendation platform to demonstrate the promise of
our approach for satisfying user experiences.
Related papers
- Interactive Garment Recommendation with User in the Loop [77.35411131350833]
We propose to build a user profile on the fly by integrating user reactions as we recommend complementary items to compose an outfit.
We present a reinforcement learning agent capable of suggesting appropriate garments and ingesting user feedback to improve its recommendations.
arXiv Detail & Related papers (2024-02-18T16:01:28Z) - PIE: Personalized Interest Exploration for Large-Scale Recommender
Systems [0.0]
We present a framework for exploration in large-scale recommender systems to address these challenges.
Our methodology can be easily integrated into an existing large-scale recommender system with minimal modifications.
Our work has been deployed in production on Facebook Watch, a popular video discovery and sharing platform serving billions of users.
arXiv Detail & Related papers (2023-04-13T22:25:09Z) - Editable User Profiles for Controllable Text Recommendation [66.00743968792275]
We propose LACE, a novel concept value bottleneck model for controllable text recommendations.
LACE represents each user with a succinct set of human-readable concepts.
It learns personalized representations of the concepts based on user documents.
arXiv Detail & Related papers (2023-04-09T14:52:18Z) - Recommending to Strategic Users [10.079698681921673]
We show that users strategically choose content to influence the types of content they get recommended in the future.
We propose three interventions that may improve recommendation quality when taking into account strategic consumption.
arXiv Detail & Related papers (2023-02-13T17:57:30Z) - Personalizing Intervened Network for Long-tailed Sequential User
Behavior Modeling [66.02953670238647]
Tail users suffer from significantly lower-quality recommendation than the head users after joint training.
A model trained on tail users separately still achieve inferior results due to limited data.
We propose a novel approach that significantly improves the recommendation performance of the tail users.
arXiv Detail & Related papers (2022-08-19T02:50:19Z) - Causal Disentanglement with Network Information for Debiased
Recommendations [34.698181166037564]
Recent research proposes to debias by modeling a recommender system from a causal perspective.
The critical challenge in this setting is accounting for the hidden confounders.
We propose to leverage network information (i.e., user-social and user-item networks) to better approximate hidden confounders.
arXiv Detail & Related papers (2022-04-14T20:55:11Z) - FEBR: Expert-Based Recommendation Framework for beneficial and
personalized content [77.86290991564829]
We propose FEBR (Expert-Based Recommendation Framework), an apprenticeship learning framework to assess the quality of the recommended content.
The framework exploits the demonstrated trajectories of an expert (assumed to be reliable) in a recommendation evaluation environment, to recover an unknown utility function.
We evaluate the performance of our solution through a user interest simulation environment (using RecSim)
arXiv Detail & Related papers (2021-07-17T18:21:31Z) - Towards Content Provider Aware Recommender Systems: A Simulation Study
on the Interplay between User and Provider Utilities [34.288256311920904]
We build a REINFORCE recommender agent, coined EcoAgent, to optimize a joint objective of user utility and the counterfactual utility lift of the provider associated with the recommended content.
We offer a number of simulated experiments that shed light on both the benefits and the limitations of our approach.
arXiv Detail & Related papers (2021-05-06T00:02:58Z) - Generative Inverse Deep Reinforcement Learning for Online Recommendation [62.09946317831129]
We propose a novel inverse reinforcement learning approach, namely InvRec, for online recommendation.
InvRec extracts the reward function from user's behaviors automatically, for online recommendation.
arXiv Detail & Related papers (2020-11-04T12:12:25Z) - Empowering Active Learning to Jointly Optimize System and User Demands [70.66168547821019]
We propose a new active learning approach that jointly optimize the active learning system (training efficiently) and the user (receiving useful instances)
We study our approach in an educational application, which particularly benefits from this technique as the system needs to rapidly learn to predict the appropriateness of an exercise to a particular user.
We evaluate multiple learning strategies and user types with data from real users and find that our joint approach better satisfies both objectives when alternative methods lead to many unsuitable exercises for end users.
arXiv Detail & Related papers (2020-05-09T16:02:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.