Reinforcement Learning with a Disentangled Universal Value Function for
Item Recommendation
- URL: http://arxiv.org/abs/2104.02981v1
- Date: Wed, 7 Apr 2021 08:13:32 GMT
- Title: Reinforcement Learning with a Disentangled Universal Value Function for
Item Recommendation
- Authors: Kai Wang, Zhene Zou, Qilin Deng, Runze Wu, Jianrong Tao, Changjie Fan,
Liang Chen, Peng Cui
- Abstract summary: We develop a model-based reinforcement learning framework with a disentangled universal value function, called GoalRec.
We demonstrate the superiority of GoalRec over previous approaches in terms of the above three practical challenges in a series of simulations and a real application.
- Score: 35.79993074465577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, there are great interests as well as challenges in applying
reinforcement learning (RL) to recommendation systems (RS). In this paper, we
summarize three key practical challenges of large-scale RL-based recommender
systems: massive state and action spaces, high-variance environment, and the
unspecific reward setting in recommendation. All these problems remain largely
unexplored in the existing literature and make the application of RL
challenging. We develop a model-based reinforcement learning framework with a
disentangled universal value function, called GoalRec. Combining the ideas of
world model (model-based), value function estimation (model-free), and
goal-based RL, a novel model-based value function formalization is proposed. It
can generalize to various goals that the recommender may have, and disentangle
the stochastic environmental dynamics and high-variance reward signals
accordingly. As a part of the value function, free from the sparse and
high-variance reward signals, a high-capacity reward-irrelevant world model is
trained to simulate complex environmental dynamics under a certain goal. Based
on the predicted environmental dynamics, the disentangled universal value
function is related to the user's future trajectory instead of a monolithic
state and a scalar reward. We demonstrate the superiority of GoalRec over
previous approaches in terms of the above three practical challenges in a
series of simulations and a real application.
Related papers
- Exploring the limits of Hierarchical World Models in Reinforcement Learning [0.7499722271664147]
We describe a novel HMBRL framework and evaluate it thoroughly.
We construct hierarchical world models that simulate environment dynamics at various levels of temporal abstraction.
Unlike most goal-conditioned H(MB)RL approaches, it also leads to comparatively low dimensional abstract actions.
arXiv Detail & Related papers (2024-06-01T16:29:03Z) - Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL)
We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$.
The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Goal-conditioned Offline Planning from Curious Exploration [28.953718733443143]
We consider the challenge of extracting goal-conditioned behavior from the products of unsupervised exploration techniques.
We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting.
In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme.
arXiv Detail & Related papers (2023-11-28T17:48:18Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - CostNet: An End-to-End Framework for Goal-Directed Reinforcement
Learning [9.432068833600884]
Reinforcement Learning (RL) is a general framework concerned with an agent that seeks to maximize rewards in an environment.
There are two approaches, model-based and model-free reinforcement learning, that show concrete results in several disciplines.
This paper introduces a novel reinforcement learning algorithm for predicting the distance between two states in a Markov Decision Process.
arXiv Detail & Related papers (2022-10-03T21:16:14Z) - Imaginary Hindsight Experience Replay: Curious Model-based Learning for
Sparse Reward Tasks [9.078290260836706]
We propose a model-based method tailored for sparse-reward tasks that foregoes the need for complicated reward engineering.
This approach, termed Imaginary Hindsight Experience Replay, minimises real-world interactions by incorporating imaginary data into policy updates.
Upon evaluation, this approach provides an order of magnitude increase in data-efficiency on average versus the state-of-the-art model-free method in the benchmark OpenAI Gym Fetch Robotics tasks.
arXiv Detail & Related papers (2021-10-05T23:38:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.