Learning from a Learning User for Optimal Recommendations
- URL: http://arxiv.org/abs/2202.01879v1
- Date: Thu, 3 Feb 2022 22:45:12 GMT
- Title: Learning from a Learning User for Optimal Recommendations
- Authors: Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang and Haifeng Xu
- Abstract summary: We formalize a model to capture "learning users" and design an efficient system-side learning solution.
We prove that the regret of RAES deteriorates gracefully as the convergence rate of user learning becomes worse.
Our study provides a novel perspective on modeling the feedback loop in recommendation problems.
- Score: 43.2268992294178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In real-world recommendation problems, especially those with a formidably
large item space, users have to gradually learn to estimate the utility of any
fresh recommendations from their experience about previously consumed items.
This in turn affects their interaction dynamics with the system and can
invalidate previous algorithms built on the omniscient user assumption. In this
paper, we formalize a model to capture such "learning users" and design an
efficient system-side learning solution, coined Noise-Robust Active Ellipsoid
Search (RAES), to confront the challenges brought by the non-stationary
feedback from such a learning user. Interestingly, we prove that the regret of
RAES deteriorates gracefully as the convergence rate of user learning becomes
worse, until reaching linear regret when the user's learning fails to converge.
Experiments on synthetic datasets demonstrate the strength of RAES for such a
contemporaneous system-user learning problem. Our study provides a novel
perspective on modeling the feedback loop in recommendation problems.
Related papers
- Interactive Counterfactual Exploration of Algorithmic Harms in Recommender Systems [3.990406494980651]
This study introduces an interactive tool designed to help users comprehend and explore the impacts of algorithmic harms in recommender systems.
By leveraging visualizations, counterfactual explanations, and interactive modules, the tool allows users to investigate how biases such as miscalibration affect their recommendations.
arXiv Detail & Related papers (2024-09-10T23:58:27Z) - CURE4Rec: A Benchmark for Recommendation Unlearning with Deeper Influence [55.21518669075263]
CURE4Rec is the first comprehensive benchmark for recommendation unlearning evaluation.
We consider the deeper influence of unlearning on recommendation fairness and robustness towards data with varying impact levels.
arXiv Detail & Related papers (2024-08-26T16:21:50Z) - Improving the Validity of Automatically Generated Feedback via
Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)
Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z) - Learning from Negative User Feedback and Measuring Responsiveness for
Sequential Recommenders [13.762960304406016]
We introduce explicit and implicit negative user feedback into the training objective of sequential recommenders.
We demonstrate the effectiveness of this approach using live experiments on a large-scale industrial recommender system.
arXiv Detail & Related papers (2023-08-23T17:16:07Z) - Simulating Bandit Learning from User Feedback for Extractive Question
Answering [51.97943858898579]
We study learning from user feedback for extractive question answering by simulating feedback using supervised data.
We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
arXiv Detail & Related papers (2022-03-18T17:47:58Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - WSLRec: Weakly Supervised Learning for Neural Sequential Recommendation
Models [24.455665093145818]
We propose a novel model-agnostic training approach called WSLRec, which adopts a three-stage framework: pre-training, top-$k$ mining, intrinsic and fine-tuning.
WSLRec resolves the incompleteness problem by pre-training models on extra weak supervisions from model-free methods like BR and ItemCF, while resolving the inaccuracy problem by leveraging the top-$k$ mining to screen out reliable user-item relevance from weak supervisions for fine-tuning.
arXiv Detail & Related papers (2022-02-28T08:55:12Z) - Context Uncertainty in Contextual Bandits with Applications to
Recommender Systems [16.597836265345634]
We propose a new type of recurrent neural networks, dubbed recurrent exploration networks (REN), to jointly perform representation learning and effective exploration in the latent space.
Our theoretical analysis shows that REN can preserve the rate-linear suboptimal regret even when there exists uncertainty in the learned representations.
Our empirical study demonstrates that REN can achieve satisfactory long-term rewards on both synthetic and real-world recommendation datasets, outperforming state-of-the-art models.
arXiv Detail & Related papers (2022-02-01T23:23:50Z) - Recency Dropout for Recurrent Recommender Systems [23.210278548403185]
We introduce the recency dropout technique, a simple yet effective data augmentation technique to alleviate the recency bias in recommender systems.
We demonstrate the effectiveness of recency dropout in various experimental settings including a simulation study, offline experiments, as well as live experiments on a large-scale industrial recommendation platform.
arXiv Detail & Related papers (2022-01-26T15:50:20Z) - Generative Inverse Deep Reinforcement Learning for Online Recommendation [62.09946317831129]
We propose a novel inverse reinforcement learning approach, namely InvRec, for online recommendation.
InvRec extracts the reward function from user's behaviors automatically, for online recommendation.
arXiv Detail & Related papers (2020-11-04T12:12:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.