UOEP: User-Oriented Exploration Policy for Enhancing Long-Term User Experiences in Recommender Systems
- URL: http://arxiv.org/abs/2401.09034v2
- Date: Wed, 22 May 2024 01:01:11 GMT
- Title: UOEP: User-Oriented Exploration Policy for Enhancing Long-Term User Experiences in Recommender Systems
- Authors: Changshuo Zhang, Sirui Chen, Xiao Zhang, Sunhao Dai, Weijie Yu, Jun Xu,
- Abstract summary: Reinforcement learning (RL) has gained traction for enhancing user long-term experiences in recommender systems.
Modern recommender systems exhibit distinct user behavioral patterns among tens of millions of items, which increases the difficulty of exploration.
We propose User-Oriented Exploration Policy (UOEP), a novel approach facilitating fine-grained exploration among user groups.
- Score: 7.635117537731915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has gained traction for enhancing user long-term experiences in recommender systems by effectively exploring users' interests. However, modern recommender systems exhibit distinct user behavioral patterns among tens of millions of items, which increases the difficulty of exploration. For example, user behaviors with different activity levels require varying intensity of exploration, while previous studies often overlook this aspect and apply a uniform exploration strategy to all users, which ultimately hurts user experiences in the long run. To address these challenges, we propose User-Oriented Exploration Policy (UOEP), a novel approach facilitating fine-grained exploration among user groups. We first construct a distributional critic which allows policy optimization under varying quantile levels of cumulative reward feedbacks from users, representing user groups with varying activity levels. Guided by this critic, we devise a population of distinct actors aimed at effective and fine-grained exploration within its respective user group. To simultaneously enhance diversity and stability during the exploration process, we further introduce a population-level diversity regularization term and a supervision module. Experimental results on public recommendation datasets demonstrate that our approach outperforms all other baselines in terms of long-term performance, validating its user-oriented exploration effectiveness. Meanwhile, further analyses reveal our approach's benefits of improved performance for low-activity users as well as increased fairness among users.
Related papers
- Unveiling User Satisfaction and Creator Productivity Trade-Offs in Recommendation Platforms [68.51708490104687]
We show that a purely relevance-driven policy with low exploration strength boosts short-term user satisfaction but undermines the long-term richness of the content pool.
Our findings reveal a fundamental trade-off between immediate user satisfaction and overall content production on platforms.
arXiv Detail & Related papers (2024-10-31T07:19:22Z) - Quantifying User Coherence: A Unified Framework for Cross-Domain Recommendation Analysis [69.37718774071793]
This paper introduces novel information-theoretic measures for understanding recommender systems.
We evaluate 7 recommendation algorithms across 9 datasets, revealing the relationships between our measures and standard performance metrics.
arXiv Detail & Related papers (2024-10-03T13:02:07Z) - Negative Sampling in Recommendation: A Survey and Future Directions [43.11318243903388]
Negative sampling is proficients in revealing the genuine negative aspect inherent in user behaviors.
We conduct an extensive literature review on the existing negative sampling strategies in recommendation.
We detail the insights of the tailored negative sampling strategies in diverse recommendation scenarios.
arXiv Detail & Related papers (2024-09-11T12:48:52Z) - Measuring Strategization in Recommendation: Users Adapt Their Behavior to Shape Future Content [66.71102704873185]
We test for user strategization by conducting a lab experiment and survey.
We find strong evidence of strategization across outcome metrics, including participants' dwell time and use of "likes"
Our findings suggest that platforms cannot ignore the effect of their algorithms on user behavior.
arXiv Detail & Related papers (2024-05-09T07:36:08Z) - PIE: Personalized Interest Exploration for Large-Scale Recommender
Systems [0.0]
We present a framework for exploration in large-scale recommender systems to address these challenges.
Our methodology can be easily integrated into an existing large-scale recommender system with minimal modifications.
Our work has been deployed in production on Facebook Watch, a popular video discovery and sharing platform serving billions of users.
arXiv Detail & Related papers (2023-04-13T22:25:09Z) - PARSRec: Explainable Personalized Attention-fused Recurrent Sequential
Recommendation Using Session Partial Actions [0.5801044612920815]
We propose an architecture that relies on common patterns as well as individual behaviors to tailor its recommendations for each person.
Our empirical results on Nielsen Consumer Panel dataset indicate that the proposed approach achieves up to 27.9% performance improvement.
arXiv Detail & Related papers (2022-09-16T12:07:43Z) - Personalizing Intervened Network for Long-tailed Sequential User
Behavior Modeling [66.02953670238647]
Tail users suffer from significantly lower-quality recommendation than the head users after joint training.
A model trained on tail users separately still achieve inferior results due to limited data.
We propose a novel approach that significantly improves the recommendation performance of the tail users.
arXiv Detail & Related papers (2022-08-19T02:50:19Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - An Empirical analysis on Transparent Algorithmic Exploration in
Recommender Systems [17.91522677924348]
We propose a new approach for feedback elicitation without any deception and compare our approach to the conventional mix-in approach for evaluation.
Our results indicated that users left significantly more feedback on items chosen for exploration with our interface.
arXiv Detail & Related papers (2021-07-31T05:08:29Z) - Exploration-Exploitation Motivated Variational Auto-Encoder for
Recommender Systems [1.52292571922932]
We introduce an exploitation-exploration motivated variational auto-encoder (XploVAE) to collaborative filtering.
To facilitate personalized recommendations, we construct user-specific subgraphs, which contain the first-order proximity capturing observed user-item interactions.
A hierarchical latent space model is utilized to learn the personalized item embedding for a given user, along with the population distribution of all user subgraphs.
arXiv Detail & Related papers (2020-06-05T17:37:46Z) - Empowering Active Learning to Jointly Optimize System and User Demands [70.66168547821019]
We propose a new active learning approach that jointly optimize the active learning system (training efficiently) and the user (receiving useful instances)
We study our approach in an educational application, which particularly benefits from this technique as the system needs to rapidly learn to predict the appropriateness of an exercise to a particular user.
We evaluate multiple learning strategies and user types with data from real users and find that our joint approach better satisfies both objectives when alternative methods lead to many unsuitable exercises for end users.
arXiv Detail & Related papers (2020-05-09T16:02:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.