RL4RS: A Real-World Dataset for Reinforcement Learning based Recommender
System
- URL: http://arxiv.org/abs/2110.11073v5
- Date: Mon, 17 Apr 2023 10:37:38 GMT
- Title: RL4RS: A Real-World Dataset for Reinforcement Learning based Recommender
System
- Authors: Kai Wang, Zhene Zou, Minghao Zhao, Qilin Deng, Yue Shang, Yile Liang,
Runze Wu, Xudong Shen, Tangjie Lyu, Changjie Fan
- Abstract summary: Reinforcement learning based recommender systems (RL-based RS) aim at learning a good policy from a batch of collected data.
Current RL-based RS research commonly has a large reality gap.
We introduce the first open-source real-world dataset, RL4RS, hoping to replace the artificial datasets and semi-simulated RS datasets.
- Score: 26.097154801770245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning based recommender systems (RL-based RS) aim at
learning a good policy from a batch of collected data, by casting
recommendations to multi-step decision-making tasks. However, current RL-based
RS research commonly has a large reality gap. In this paper, we introduce the
first open-source real-world dataset, RL4RS, hoping to replace the artificial
datasets and semi-simulated RS datasets previous studies used due to the
resource limitation of the RL-based RS domain. Unlike academic RL research,
RL-based RS suffers from the difficulties of being well-validated before
deployment. We attempt to propose a new systematic evaluation framework,
including evaluation of environment simulation, evaluation on environments,
counterfactual policy evaluation, and evaluation on environments built from
test set. In summary, the RL4RS (Reinforcement Learning for Recommender
Systems), a new resource with special concerns on the reality gaps, contains
two real-world datasets, data understanding tools, tuned simulation
environments, related advanced RL baselines, batch RL baselines, and
counterfactual policy evaluation algorithms. The RL4RS suite can be found at
https://github.com/fuxiAIlab/RL4RS. In addition to the RL-based recommender
systems, we expect the resource to contribute to research in applied
reinforcement learning.
Related papers
- How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - EasyRL4Rec: An Easy-to-use Library for Reinforcement Learning Based Recommender Systems [18.22130279210423]
We introduce EasyRL4Rec, an easy-to-use code library designed specifically for RL-based RSs.
This library provides lightweight and diverse RL environments based on five public datasets.
EasyRL4Rec seeks to facilitate the model development and experimental process in the domain of RL-based RSs.
arXiv Detail & Related papers (2024-02-23T07:54:26Z) - B2RL: An open-source Dataset for Building Batch Reinforcement Learning [0.0]
Batch reinforcement learning (BRL) is an emerging research area in the RL community.
We are the first to open-source building datasets for the purpose of BRL learning.
arXiv Detail & Related papers (2022-09-30T17:54:42Z) - When does return-conditioned supervised learning work for offline
reinforcement learning? [51.899892382786526]
We study the capabilities and limitations of return-conditioned supervised learning.
We find that RCSL returns the optimal policy under a set of assumptions stronger than those needed for the more traditional dynamic programming-based algorithms.
arXiv Detail & Related papers (2022-06-02T15:05:42Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Value Penalized Q-Learning for Recommender Systems [30.704083806571074]
Scaling reinforcement learning to recommender systems (RS) is promising since maximizing the expected cumulative rewards for RL agents meets the objective of RS.
A key approach to this goal is offline RL, which aims to learn policies from logged data.
We propose Value Penalized Q-learning (VPQ), an uncertainty-based offline RL algorithm.
arXiv Detail & Related papers (2021-10-15T08:08:28Z) - S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement
Learning [28.947071041811586]
offline reinforcement learning proposes to learn policies from large collected datasets without interaction.
Current algorithms overfit to the dataset they are trained on and perform poor out-of-distribution generalization to the environment when deployed.
We propose a Surprisingly Simple Self-Supervision algorithm (S4RL) which utilizes data augmentations from states to learn value functions that are better at generalizing and extrapolating when deployed in the environment.
arXiv Detail & Related papers (2021-03-10T20:13:21Z) - Near Real-World Benchmarks for Offline Reinforcement Learning [26.642722521820467]
We present a suite of near real-world benchmarks, NewRL.
NewRL contains datasets from various domains with controlled sizes and extra test datasets for the purpose of policy validation.
We argue that the performance of a policy should also be compared with the deterministic version of the behavior policy, instead of the dataset reward.
arXiv Detail & Related papers (2021-02-01T09:19:10Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.