Breaking the Cold-Start Barrier: Reinforcement Learning with Double and Dueling DQNs
- URL: http://arxiv.org/abs/2508.21259v1
- Date: Thu, 28 Aug 2025 23:14:07 GMT
- Title: Breaking the Cold-Start Barrier: Reinforcement Learning with Double and Dueling DQNs
- Authors: Minda Zhao,
- Abstract summary: This paper proposes a reinforcement learning approach using Double and Dueling Deep Q-Networks (DQN) to dynamically learn user preferences from sparse feedback.<n>By integrating these advanced DQN variants with a matrix factorization model, we achieve superior performance on a large e-commerce dataset.
- Score: 4.031998949939877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recommender systems struggle to provide accurate suggestions to new users with limited interaction history, a challenge known as the cold-user problem. This paper proposes a reinforcement learning approach using Double and Dueling Deep Q-Networks (DQN) to dynamically learn user preferences from sparse feedback, enhancing recommendation accuracy without relying on sensitive demographic data. By integrating these advanced DQN variants with a matrix factorization model, we achieve superior performance on a large e-commerce dataset compared to traditional methods like popularity-based and active learning strategies. Experimental results show that our method, particularly Dueling DQN, reduces Root Mean Square Error (RMSE) for cold users, offering an effective solution for privacy-constrained environments.
Related papers
- Customized Retrieval-Augmented Generation with LLM for Debiasing Recommendation Unlearning [11.187910465178078]
CRAGRU is a novel framework for efficient, user-specific unlearning.<n>It mitigates unlearning bias while preserving recommendation quality.<n>Our work highlights the promise of RAG-based architectures for building robust and privacy-preserving recommender systems.
arXiv Detail & Related papers (2025-09-10T08:49:58Z) - Pre-training for Recommendation Unlearning [14.514770044236375]
UnlearnRec is a model-agnostic pre-training paradigm that prepares systems for efficient unlearning operations.<n>Our method delivers exceptional unlearning effectiveness while providing more than 10x speedup compared to retraining approaches.
arXiv Detail & Related papers (2025-05-28T17:57:11Z) - PAUSE: Low-Latency and Privacy-Aware Active User Selection for Federated Learning [49.02872047060618]
Federated learning (FL) enables edge devices to collaboratively train a machine learning model without the need to share potentially private data.<n>FL poses two key challenges: First, the accumulation of privacy leakage over time, and second, communication latency.<n>We propose a method that jointly addresses the accumulation of privacy leakage and communication latency via active user selection.
arXiv Detail & Related papers (2025-03-17T13:50:35Z) - Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback.<n>We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z) - Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning [70.22819290458581]
Reinforcement learning with human feedback (RLHF) is a widely adopted approach in current large language model pipelines.
Our approach introduces two key innovations: (1) on-policy query to avoid OOD and imbalance issues in seed data, and (2) active learning to select the most informative data for preference queries.
arXiv Detail & Related papers (2024-07-02T10:09:19Z) - Interactive Graph Convolutional Filtering [79.34979767405979]
Interactive Recommender Systems (IRS) have been increasingly used in various domains, including personalized article recommendation, social media, and online advertising.
These problems are exacerbated by the cold start problem and data sparsity problem.
Existing Multi-Armed Bandit methods, despite their carefully designed exploration strategies, often struggle to provide satisfactory results in the early stages.
Our proposed method extends interactive collaborative filtering into the graph model to enhance the performance of collaborative filtering between users and items.
arXiv Detail & Related papers (2023-09-04T09:02:31Z) - RESUS: Warm-Up Cold Users via Meta-Learning Residual User Preferences in
CTR Prediction [14.807495564177252]
Click-Through Rate (CTR) prediction on cold users is a challenging task in recommender systems.
We propose a novel and efficient approach named RESUS, which decouples the learning of global preference knowledge contributed by collective users from the learning of residual preferences for individual users.
Our approach is efficient and effective in improving CTR prediction accuracy on cold users, compared with various state-of-the-art methods.
arXiv Detail & Related papers (2022-10-28T11:57:58Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Sparsity Regularization For Cold-Start Recommendation [7.848143873095096]
We introduce a novel representation for user-vectors by combining user demographics and user preferences.
We develop a novel sparse adversarial model, SRLGAN, for Cold-Start Recommendation leveraging the sparse user-purchase behavior.
We evaluate the SRLGAN on two popular datasets and demonstrate state-of-the-art results.
arXiv Detail & Related papers (2022-01-26T02:28:08Z) - Learning to Learn a Cold-start Sequential Recommender [70.5692886883067]
The cold-start recommendation is an urgent problem in contemporary online applications.
We propose a meta-learning based cold-start sequential recommendation framework called metaCSR.
metaCSR holds the ability to learn the common patterns from regular users' behaviors.
arXiv Detail & Related papers (2021-10-18T08:11:24Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.