Compressive Features in Offline Reinforcement Learning for Recommender
Systems
- URL: http://arxiv.org/abs/2111.08817v1
- Date: Tue, 16 Nov 2021 22:43:16 GMT
- Title: Compressive Features in Offline Reinforcement Learning for Recommender
Systems
- Authors: Hung Nguyen, Minh Nguyen, Long Pham, Jennifer Adorno Nieves
- Abstract summary: We develop a recommender system for a game that suggests potential items to players based on their interactive behaviors to maximize revenue for the game provider.
Our approach is built on a reinforcement learning-based technique and is trained on an offline data set that is publicly available on an IEEE Big Data Cup challenge.
- Score: 2.3513645401551333
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we develop a recommender system for a game that suggests
potential items to players based on their interactive behaviors to maximize
revenue for the game provider. Our approach is built on a reinforcement
learning-based technique and is trained on an offline data set that is publicly
available on an IEEE Big Data Cup challenge. The limitation of the offline data
set and the curse of high dimensionality pose significant obstacles to solving
this problem. Our proposed method focuses on improving the total rewards and
performance by tackling these main difficulties. More specifically, we utilized
sparse PCA to extract important features of user behaviors. Our
Q-learning-based system is then trained from the processed offline data set. To
exploit all possible information from the provided data set, we cluster user
features to different groups and build an independent Q-table for each group.
Furthermore, to tackle the challenge of unknown formula for evaluation metrics,
we design a metric to self-evaluate our system's performance based on the
potential value the game provider might achieve and a small collection of
actual evaluation metrics that we obtain from the live scoring environment. Our
experiments show that our proposed metric is consistent with the results
published by the challenge organizers. We have implemented the proposed
training pipeline, and the results show that our method outperforms current
state-of-the-art methods in terms of both total rewards and training speed. By
addressing the main challenges and leveraging the state-of-the-art techniques,
we have achieved the best public leaderboard result in the challenge.
Furthermore, our proposed method achieved an estimated score of approximately
20% better and can be trained faster by 30 times than the best of the current
state-of-the-art methods.
Related papers
- Pay More Attention to the Robustness of Prompt for Instruction Data Mining [15.350709684929116]
This paper proposes a pioneering framework of high-quality online instruction data mining for instruction tuning.
Our notable innovation, is to generate the adversarial instruction data by conducting the attack for the prompt of online instruction data.
We conduct extensive experiments on two benchmark datasets to assess the performance.
arXiv Detail & Related papers (2025-03-31T12:53:08Z) - Robust Offline Imitation Learning Through State-level Trajectory Stitching [37.281554320048755]
Imitation learning (IL) has proven effective for enabling robots to acquire visuomotor skills through expert demonstrations.
Recent advances in offline IL have incorporated suboptimal, unlabeled datasets into the training.
We propose a novel approach to enhance policy learning from mixed-quality offline datasets by leveraging task-relevant trajectory fragments and rich environmental dynamics.
arXiv Detail & Related papers (2025-03-28T15:28:36Z) - Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF [67.48004037550064]
We propose an active learning approach to efficiently select prompt and preference pairs.
Our method evaluates the gradients of all potential preference annotations to assess their impact on model updates.
Experimental results demonstrate that our method outperforms the baseline by up to 5% in win rates against the chosen completion.
arXiv Detail & Related papers (2025-03-28T04:22:53Z) - KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [75.78948575957081]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner.
We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases.
Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - A Human-Centered Approach for Improving Supervised Learning [0.44378250612683995]
This paper shows how we can strike a balance between performance, time, and resource constraints.
Another goal of this research is to make Ensembles more explainable and intelligible using the Human-Centered approach.
arXiv Detail & Related papers (2024-10-14T10:27:14Z) - Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling [0.9831489366502301]
Job Shop Scheduling Problem (JSSP) is a complex optimization problem.
Online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP.
We introduce Offline Reinforcement Learning for Learning to Dispatch (Offline-LD)
arXiv Detail & Related papers (2024-09-16T15:18:10Z) - Active Learning to Guide Labeling Efforts for Question Difficulty Estimation [1.0514231683620516]
Transformer-based neural networks achieve state-of-the-art performance, primarily through supervised methods but with an isolated study in unsupervised learning.
This work bridges the research gap by exploring active learning for QDE, a supervised human-in-the-loop approach.
Experiments demonstrate that active learning with PowerVariance acquisition achieves a performance close to fully supervised models after labeling only 10% of the training data.
arXiv Detail & Related papers (2024-09-14T02:02:42Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Embedding in Recommender Systems: A Survey [67.67966158305603]
A crucial aspect is embedding techniques that covert the high-dimensional discrete features, such as user and item IDs, into low-dimensional continuous vectors.
Applying embedding techniques captures complex entity relationships and has spurred substantial research.
This survey covers embedding methods like collaborative filtering, self-supervised learning, and graph-based techniques.
arXiv Detail & Related papers (2023-10-28T06:31:06Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Offline Robot Reinforcement Learning with Uncertainty-Guided Human
Expert Sampling [11.751910133386254]
Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data.
We propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data.
Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent.
arXiv Detail & Related papers (2022-12-16T01:41:59Z) - Evaluating Membership Inference Through Adversarial Robustness [6.983991370116041]
We propose an enhanced methodology for membership inference attacks based on adversarial robustness.
We evaluate our proposed method on three datasets: Fashion-MNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-05-14T06:48:47Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Online Coreset Selection for Rehearsal-based Continual Learning [65.85595842458882]
In continual learning, we store a subset of training examples (coreset) to be replayed later to alleviate catastrophic forgetting.
We propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration.
Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting.
arXiv Detail & Related papers (2021-06-02T11:39:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.