Related papers: Compressive Features in Offline Reinforcement Learning for Recommender Systems

Compressive Features in Offline Reinforcement Learning for Recommender Systems

URL: http://arxiv.org/abs/2111.08817v1
Date: Tue, 16 Nov 2021 22:43:16 GMT
Title: Compressive Features in Offline Reinforcement Learning for Recommender Systems
Authors: Hung Nguyen, Minh Nguyen, Long Pham, Jennifer Adorno Nieves
Abstract summary: We develop a recommender system for a game that suggests potential items to players based on their interactive behaviors to maximize revenue for the game provider. Our approach is built on a reinforcement learning-based technique and is trained on an offline data set that is publicly available on an IEEE Big Data Cup challenge.
Score: 2.3513645401551333
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we develop a recommender system for a game that suggests potential items to players based on their interactive behaviors to maximize revenue for the game provider. Our approach is built on a reinforcement learning-based technique and is trained on an offline data set that is publicly available on an IEEE Big Data Cup challenge. The limitation of the offline data set and the curse of high dimensionality pose significant obstacles to solving this problem. Our proposed method focuses on improving the total rewards and performance by tackling these main difficulties. More specifically, we utilized sparse PCA to extract important features of user behaviors. Our Q-learning-based system is then trained from the processed offline data set. To exploit all possible information from the provided data set, we cluster user features to different groups and build an independent Q-table for each group. Furthermore, to tackle the challenge of unknown formula for evaluation metrics, we design a metric to self-evaluate our system's performance based on the potential value the game provider might achieve and a small collection of actual evaluation metrics that we obtain from the live scoring environment. Our experiments show that our proposed metric is consistent with the results published by the challenge organizers. We have implemented the proposed training pipeline, and the results show that our method outperforms current state-of-the-art methods in terms of both total rewards and training speed. By addressing the main challenges and leveraging the state-of-the-art techniques, we have achieved the best public leaderboard result in the challenge. Furthermore, our proposed method achieved an estimated score of approximately 20% better and can be trained faster by 30 times than the best of the current state-of-the-art methods.

Related papers

Test-time Offline Reinforcement Learning on Goal-related Experience [50.94457794664909]
Research in foundation models has shown that performance can be substantially improved through test-time training.<n>We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state.<n>Our goal-conditioned test-time training (GC-TTT) algorithm applies this routine in a receding-horizon fashion during evaluation, adapting the policy to the current trajectory as it is being rolled out.
arXiv Detail & Related papers (2025-07-24T21:11:39Z)
Pay More Attention to the Robustness of Prompt for Instruction Data Mining [15.350709684929116]
This paper proposes a pioneering framework of high-quality online instruction data mining for instruction tuning. Our notable innovation, is to generate the adversarial instruction data by conducting the attack for the prompt of online instruction data. We conduct extensive experiments on two benchmark datasets to assess the performance.
arXiv Detail & Related papers (2025-03-31T12:53:08Z)
Robust Offline Imitation Learning Through State-level Trajectory Stitching [37.281554320048755]
Imitation learning (IL) has proven effective for enabling robots to acquire visuomotor skills through expert demonstrations. Recent advances in offline IL have incorporated suboptimal, unlabeled datasets into the training. We propose a novel approach to enhance policy learning from mixed-quality offline datasets by leveraging task-relevant trajectory fragments and rich environmental dynamics.
arXiv Detail & Related papers (2025-03-28T15:28:36Z)
Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF [67.48004037550064]
We propose an active learning approach to efficiently select prompt and preference pairs. Our method evaluates the gradients of all potential preference annotations to assess their impact on model updates. Experimental results demonstrate that our method outperforms the baseline by up to 5% in win rates against the chosen completion.
arXiv Detail & Related papers (2025-03-28T04:22:53Z)
KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [75.78948575957081]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner. We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases. Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z)
A Human-Centered Approach for Improving Supervised Learning [0.44378250612683995]
This paper shows how we can strike a balance between performance, time, and resource constraints. Another goal of this research is to make Ensembles more explainable and intelligible using the Human-Centered approach.
arXiv Detail & Related papers (2024-10-14T10:27:14Z)
Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling [0.9831489366502301]
Job Shop Scheduling Problem (JSSP) is a complex optimization problem. Online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP. We introduce Offline Reinforcement Learning for Learning to Dispatch (Offline-LD)
arXiv Detail & Related papers (2024-09-16T15:18:10Z)
Active Learning to Guide Labeling Efforts for Question Difficulty Estimation [1.0514231683620516]
Transformer-based neural networks achieve state-of-the-art performance, primarily through supervised methods but with an isolated study in unsupervised learning. This work bridges the research gap by exploring active learning for QDE, a supervised human-in-the-loop approach. Experiments demonstrate that active learning with PowerVariance acquisition achieves a performance close to fully supervised models after labeling only 10% of the training data.
arXiv Detail & Related papers (2024-09-14T02:02:42Z)
One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets. We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z)
Exploring Federated Unlearning: Review, Comparison, and Insights [101.64910079905566]
federated unlearning enables the selective removal of data from models trained in federated systems.<n>This paper examines existing federated unlearning approaches, examining their algorithmic efficiency, impact on model accuracy, and effectiveness in preserving privacy.<n>We propose the OpenFederatedUnlearning framework, a unified benchmark for evaluating federated unlearning methods.
arXiv Detail & Related papers (2023-10-30T01:34:33Z)
Embedding in Recommender Systems: A Survey [67.67966158305603]
A crucial aspect is embedding techniques that covert the high-dimensional discrete features, such as user and item IDs, into low-dimensional continuous vectors. Applying embedding techniques captures complex entity relationships and has spurred substantial research. This survey covers embedding methods like collaborative filtering, self-supervised learning, and graph-based techniques.
arXiv Detail & Related papers (2023-10-28T06:31:06Z)
Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online. We extensively ablate these design choices, demonstrating the key factors that most affect performance. We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z)
Offline Robot Reinforcement Learning with Uncertainty-Guided Human Expert Sampling [11.751910133386254]
Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data. We propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data. Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent.
arXiv Detail & Related papers (2022-12-16T01:41:59Z)
Evaluating Membership Inference Through Adversarial Robustness [6.983991370116041]
We propose an enhanced methodology for membership inference attacks based on adversarial robustness. We evaluate our proposed method on three datasets: Fashion-MNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-05-14T06:48:47Z)
SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation. In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor. Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z)
Online Coreset Selection for Rehearsal-based Continual Learning [65.85595842458882]
In continual learning, we store a subset of training examples (coreset) to be replayed later to alleviate catastrophic forgetting. We propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration. Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting.
arXiv Detail & Related papers (2021-06-02T11:39:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.