Open-Ended Diverse Solution Discovery with Regulated Behavior Patterns
for Cross-Domain Adaptation
- URL: http://arxiv.org/abs/2209.12029v2
- Date: Sat, 20 May 2023 08:23:42 GMT
- Title: Open-Ended Diverse Solution Discovery with Regulated Behavior Patterns
for Cross-Domain Adaptation
- Authors: Kang Xu, Yan Ma, Bingsheng Wei, Wei Li
- Abstract summary: Policies with diverse behavior characteristics can generalize to downstream environments with various discrepancies.
Such policies might result in catastrophic damage during the deployment in practical scenarios like real-world systems.
We propose Diversity in Regulation (DiR) training diverse policies with regulated behaviors to discover desired patterns.
- Score: 5.090135391530077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While Reinforcement Learning can achieve impressive results for complex
tasks, the learned policies are generally prone to fail in downstream tasks
with even minor model mismatch or unexpected perturbations. Recent works have
demonstrated that a policy population with diverse behavior characteristics can
generalize to downstream environments with various discrepancies. However, such
policies might result in catastrophic damage during the deployment in practical
scenarios like real-world systems due to the unrestricted behaviors of trained
policies. Furthermore, training diverse policies without regulation of the
behavior can result in inadequate feasible policies for extrapolating to a wide
range of test conditions with dynamics shifts. In this work, we aim to train
diverse policies under the regularization of the behavior patterns. We motivate
our paradigm by observing the inverse dynamics in the environment with partial
state information and propose Diversity in Regulation (DiR) training diverse
policies with regulated behaviors to discover desired patterns that benefit the
generalization. Considerable empirical results on various variations of
different environments indicate that our method attains improvements over other
diversity-driven counterparts.
Related papers
- Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - Examining Policy Entropy of Reinforcement Learning Agents for Personalization Tasks [0.40964539027092917]
This effort is focused on examining the behavior of reinforcement learning systems in personalization environments.
We provide a wide range of numerical experiments as well as theoretical justification to show that these differences in entropy are due to the type of learning being employed.
arXiv Detail & Related papers (2022-11-21T21:42:50Z) - CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies [62.39667564455059]
We consider and study a distribution of optimal policies.
In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems.
We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability.
arXiv Detail & Related papers (2022-05-19T09:48:56Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Dual Behavior Regularized Reinforcement Learning [8.883885464358737]
Reinforcement learning has been shown to perform a range of complex tasks through interaction with an environment or collected leveraging experience.
We propose dual, advantage-based behavior policy based on counterfactual regret minimization.
arXiv Detail & Related papers (2021-09-19T00:47:18Z) - One Solution is Not All You Need: Few-Shot Extrapolation via Structured
MaxEnt RL [142.36621929739707]
We show that learning diverse behaviors for accomplishing a task can lead to behavior that generalizes to varying environments.
By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations.
arXiv Detail & Related papers (2020-10-27T17:41:57Z) - Provably Efficient Model-based Policy Adaptation [22.752774605277555]
A promising approach is to quickly adapt pre-trained policies to new environments.
Existing methods for this policy adaptation problem typically rely on domain randomization and meta-learning.
We propose new model-based mechanisms that are able to make online adaptation in unseen target environments.
arXiv Detail & Related papers (2020-06-14T23:16:20Z) - Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents.
We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively.
We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.