Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference
- URL: http://arxiv.org/abs/2105.00822v2
- Date: Wed, 5 May 2021 13:01:28 GMT
- Title: Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference
- Authors: Xiaocong Chen, Lina Yao, Xianzhi Wang, Aixin Sun, Wenjie Zhang and
Quan Z. Sheng
- Abstract summary: We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
- Score: 71.11416263370823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in reinforcement learning have inspired increasing interest
in learning user modeling adaptively through dynamic interactions, e.g., in
reinforcement learning based recommender systems. Reward function is crucial
for most of reinforcement learning applications as it can provide the guideline
about the optimization. However, current reinforcement-learning-based methods
rely on manually-defined reward functions, which cannot adapt to dynamic and
noisy environments. Besides, they generally use task-specific reward functions
that sacrifice generalization ability. We propose a generative inverse
reinforcement learning for user behavioral preference modelling, to address the
above issues. Instead of using predefined reward functions, our model can
automatically learn the rewards from user's actions based on discriminative
actor-critic network and Wasserstein GAN. Our model provides a general way of
characterizing and explaining underlying behavioral tendencies, and our
experiments show our method outperforms state-of-the-art methods in a variety
of scenarios, namely traffic signal control, online recommender systems, and
scanpath prediction.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Self-Supervised Reinforcement Learning that Transfers using Random
Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards.
Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Model-free Policy Learning with Reward Gradients [9.847875182113137]
We develop the textitReward Policy Gradient estimator, a novel approach that integrates reward gradients without learning a model.
Our method also boosts the performance of Proximal Policy Optimization on different MuJoCo control tasks.
arXiv Detail & Related papers (2021-03-09T00:14:13Z) - Generative Inverse Deep Reinforcement Learning for Online Recommendation [62.09946317831129]
We propose a novel inverse reinforcement learning approach, namely InvRec, for online recommendation.
InvRec extracts the reward function from user's behaviors automatically, for online recommendation.
arXiv Detail & Related papers (2020-11-04T12:12:25Z) - Trajectory-wise Multiple Choice Learning for Dynamics Generalization in
Reinforcement Learning [137.39196753245105]
We present a new model-based reinforcement learning algorithm that learns a multi-headed dynamics model for dynamics generalization.
We incorporate context learning, which encodes dynamics-specific information from past experiences into the context latent vector.
Our method exhibits superior zero-shot generalization performance across a variety of control tasks, compared to state-of-the-art RL methods.
arXiv Detail & Related papers (2020-10-26T03:20:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.