Dual Behavior Regularized Reinforcement Learning
- URL: http://arxiv.org/abs/2109.09037v1
- Date: Sun, 19 Sep 2021 00:47:18 GMT
- Title: Dual Behavior Regularized Reinforcement Learning
- Authors: Chapman Siu, Jason Traish, Richard Yi Da Xu
- Abstract summary: Reinforcement learning has been shown to perform a range of complex tasks through interaction with an environment or collected leveraging experience.
We propose dual, advantage-based behavior policy based on counterfactual regret minimization.
- Score: 8.883885464358737
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reinforcement learning has been shown to perform a range of complex tasks
through interaction with an environment or collected leveraging experience.
However, many of these approaches presume optimal or near optimal experiences
or the presence of a consistent environment. In this work we propose dual,
advantage-based behavior policy based on counterfactual regret minimization. We
demonstrate the flexibility of this approach and how it can be adapted to
online contexts where the environment is available to collect experiences and a
variety of other contexts. We demonstrate this new algorithm can outperform
several strong baseline models in different contexts based on a range of
continuous environments. Additional ablations provide insights into how our
dual behavior regularized reinforcement learning approach is designed compared
with other plausible modifications and demonstrates its ability to generalize.
Related papers
- Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - Environment Design for Inverse Reinforcement Learning [3.085995273374333]
Current inverse reinforcement learning methods that focus on learning from a single environment can fail to handle slight changes in the environment dynamics.
In our framework, the learner repeatedly interacts with the expert, with the former selecting environments to identify the reward function.
This results in improvements in both sample-efficiency and robustness, as we show experimentally, for both exact and approximate inference.
arXiv Detail & Related papers (2022-10-26T18:31:17Z) - Open-Ended Diverse Solution Discovery with Regulated Behavior Patterns
for Cross-Domain Adaptation [5.090135391530077]
Policies with diverse behavior characteristics can generalize to downstream environments with various discrepancies.
Such policies might result in catastrophic damage during the deployment in practical scenarios like real-world systems.
We propose Diversity in Regulation (DiR) training diverse policies with regulated behaviors to discover desired patterns.
arXiv Detail & Related papers (2022-09-24T15:13:51Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework.
We show that this leads to behaviorally diverse options which are also capable of state abstraction.
We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z) - One Solution is Not All You Need: Few-Shot Extrapolation via Structured
MaxEnt RL [142.36621929739707]
We show that learning diverse behaviors for accomplishing a task can lead to behavior that generalizes to varying environments.
By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations.
arXiv Detail & Related papers (2020-10-27T17:41:57Z) - Characterizing Policy Divergence for Personalized Meta-Reinforcement
Learning [4.716565301427257]
We consider the problem of recommending optimal policies to a set of multiple entities each with potentially different characteristics.
Inspired by existing literature in meta-learning, we propose a model-free meta-learning algorithm that prioritizes past experiences by relevance during gradient-based adaptation.
Our algorithm involves characterizing past policy divergence through methods in inverse reinforcement learning, and we illustrate how such metrics are able to effectively distinguish past policy parameters by the environment they were deployed in.
arXiv Detail & Related papers (2020-10-09T21:31:53Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Generalization Guarantees for Imitation Learning [6.542289202349586]
Control policies from imitation learning can often fail to generalize to novel environments.
We present rigorous generalization guarantees for imitation learning by leveraging the Probably Approximately Correct (PAC)-Bayes framework.
arXiv Detail & Related papers (2020-08-05T03:04:13Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z) - Provably Efficient Model-based Policy Adaptation [22.752774605277555]
A promising approach is to quickly adapt pre-trained policies to new environments.
Existing methods for this policy adaptation problem typically rely on domain randomization and meta-learning.
We propose new model-based mechanisms that are able to make online adaptation in unseen target environments.
arXiv Detail & Related papers (2020-06-14T23:16:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.