Deep Exploration for Recommendation Systems
- URL: http://arxiv.org/abs/2109.12509v4
- Date: Sun, 30 Jul 2023 08:39:53 GMT
- Title: Deep Exploration for Recommendation Systems
- Authors: Zheqing Zhu, Benjamin Van Roy
- Abstract summary: We develop deep exploration methods for recommendation systems.
In particular, we formulate recommendation as a sequential decision problem.
Our experiments are carried out with high-fidelity industrial-grade simulators.
- Score: 14.937000494745861
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern recommendation systems ought to benefit by probing for and learning
from delayed feedback. Research has tended to focus on learning from a user's
response to a single recommendation. Such work, which leverages methods of
supervised and bandit learning, forgoes learning from the user's subsequent
behavior. Where past work has aimed to learn from subsequent behavior, there
has been a lack of effective methods for probing to elicit informative delayed
feedback. Effective exploration through probing for delayed feedback becomes
particularly challenging when rewards are sparse. To address this, we develop
deep exploration methods for recommendation systems. In particular, we
formulate recommendation as a sequential decision problem and demonstrate
benefits of deep exploration over single-step exploration. Our experiments are
carried out with high-fidelity industrial-grade simulators and establish large
improvements over existing algorithms.
Related papers
- Breadcrumbs to the Goal: Goal-Conditioned Exploration from
Human-in-the-Loop Feedback [22.89046164459011]
We present a technique called Human Guided Exploration (HuGE), which uses low-quality feedback from non-expert users.
HuGE guides exploration for reinforcement learning not only in simulation but also in the real world, all without meticulous reward specification.
arXiv Detail & Related papers (2023-07-20T17:30:37Z) - Towards Improving Exploration in Self-Imitation Learning using Intrinsic
Motivation [7.489793155793319]
Reinforcement Learning has emerged as a strong alternative to solve optimization tasks efficiently.
The use of these algorithms highly depends on the feedback signals provided by the environment in charge of informing about how good (or bad) the decisions made by the learned agent are.
In this work intrinsic motivation is used to encourage the agent to explore the environment based on its curiosity, whereas imitation learning allows repeating the most promising experiences to accelerate the learning process.
arXiv Detail & Related papers (2022-11-30T09:18:59Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Real-Time Learning from An Expert in Deep Recommendation Systems with
Marginal Distance Probability Distribution [1.3535770763481902]
We develop a recommendation system for daily exercise activities to users based on their history, profile and similar users.
The developed recommendation system uses a deep recurrent neural network with user-profile attention and temporal attention mechanisms.
We propose a real-time, expert-in-the-loop active learning procedure.
arXiv Detail & Related papers (2021-10-12T19:20:18Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Generative Inverse Deep Reinforcement Learning for Online Recommendation [62.09946317831129]
We propose a novel inverse reinforcement learning approach, namely InvRec, for online recommendation.
InvRec extracts the reward function from user's behaviors automatically, for online recommendation.
arXiv Detail & Related papers (2020-11-04T12:12:25Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Knowledge Transfer via Pre-training for Recommendation: A Review and
Prospect [89.91745908462417]
We show the benefits of pre-training to recommender systems through experiments.
We discuss several promising directions for future research for recommender systems with pre-training.
arXiv Detail & Related papers (2020-09-19T13:06:27Z) - A Survey on Knowledge Graph-Based Recommender Systems [65.50486149662564]
We conduct a systematical survey of knowledge graph-based recommender systems.
We focus on how the papers utilize the knowledge graph for accurate and explainable recommendation.
We introduce datasets used in these works.
arXiv Detail & Related papers (2020-02-28T02:26:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.