Related papers: RL4RS: A Real-World Dataset for Reinforcement Learning based Recommender System

RL4RS: A Real-World Dataset for Reinforcement Learning based Recommender System

URL: http://arxiv.org/abs/2110.11073v5
Date: Mon, 17 Apr 2023 10:37:38 GMT
Title: RL4RS: A Real-World Dataset for Reinforcement Learning based Recommender System
Authors: Kai Wang, Zhene Zou, Minghao Zhao, Qilin Deng, Yue Shang, Yile Liang, Runze Wu, Xudong Shen, Tangjie Lyu, Changjie Fan
Abstract summary: Reinforcement learning based recommender systems (RL-based RS) aim at learning a good policy from a batch of collected data. Current RL-based RS research commonly has a large reality gap. We introduce the first open-source real-world dataset, RL4RS, hoping to replace the artificial datasets and semi-simulated RS datasets.
Score: 26.097154801770245
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning based recommender systems (RL-based RS) aim at learning a good policy from a batch of collected data, by casting recommendations to multi-step decision-making tasks. However, current RL-based RS research commonly has a large reality gap. In this paper, we introduce the first open-source real-world dataset, RL4RS, hoping to replace the artificial datasets and semi-simulated RS datasets previous studies used due to the resource limitation of the RL-based RS domain. Unlike academic RL research, RL-based RS suffers from the difficulties of being well-validated before deployment. We attempt to propose a new systematic evaluation framework, including evaluation of environment simulation, evaluation on environments, counterfactual policy evaluation, and evaluation on environments built from test set. In summary, the RL4RS (Reinforcement Learning for Recommender Systems), a new resource with special concerns on the reality gaps, contains two real-world datasets, data understanding tools, tuned simulation environments, related advanced RL baselines, batch RL baselines, and counterfactual policy evaluation algorithms. The RL4RS suite can be found at https://github.com/fuxiAIlab/RL4RS. In addition to the RL-based recommender systems, we expect the resource to contribute to research in applied reinforcement learning.

Related papers

How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities. We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
EasyRL4Rec: An Easy-to-use Library for Reinforcement Learning Based Recommender Systems [18.22130279210423]
We introduce EasyRL4Rec, an easy-to-use code library designed specifically for RL-based RSs. This library provides lightweight and diverse RL environments based on five public datasets. EasyRL4Rec seeks to facilitate the model development and experimental process in the domain of RL-based RSs.
arXiv Detail & Related papers (2024-02-23T07:54:26Z)
B2RL: An open-source Dataset for Building Batch Reinforcement Learning [0.0]
Batch reinforcement learning (BRL) is an emerging research area in the RL community. We are the first to open-source building datasets for the purpose of BRL learning.
arXiv Detail & Related papers (2022-09-30T17:54:42Z)
When does return-conditioned supervised learning work for offline reinforcement learning? [51.899892382786526]
We study the capabilities and limitations of return-conditioned supervised learning. We find that RCSL returns the optimal policy under a set of assumptions stronger than those needed for the more traditional dynamic programming-based algorithms.
arXiv Detail & Related papers (2022-06-02T15:05:42Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Value Penalized Q-Learning for Recommender Systems [30.704083806571074]
Scaling reinforcement learning to recommender systems (RS) is promising since maximizing the expected cumulative rewards for RL agents meets the objective of RS. A key approach to this goal is offline RL, which aims to learn policies from logged data. We propose Value Penalized Q-learning (VPQ), an uncertainty-based offline RL algorithm.
arXiv Detail & Related papers (2021-10-15T08:08:28Z)
S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning [28.947071041811586]
offline reinforcement learning proposes to learn policies from large collected datasets without interaction. Current algorithms overfit to the dataset they are trained on and perform poor out-of-distribution generalization to the environment when deployed. We propose a Surprisingly Simple Self-Supervision algorithm (S4RL) which utilizes data augmentations from states to learn value functions that are better at generalizing and extrapolating when deployed in the environment.
arXiv Detail & Related papers (2021-03-10T20:13:21Z)
Near Real-World Benchmarks for Offline Reinforcement Learning [26.642722521820467]
We present a suite of near real-world benchmarks, NewRL. NewRL contains datasets from various domains with controlled sizes and extra test datasets for the purpose of policy validation. We argue that the performance of a policy should also be compared with the deterministic version of the behavior policy, instead of the dataset reward.
arXiv Detail & Related papers (2021-02-01T09:19:10Z)
Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR) We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods. RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems. We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z)
D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL. By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.