Composable Energy Policies for Reactive Motion Generation and
Reinforcement Learning
- URL: http://arxiv.org/abs/2105.04962v1
- Date: Tue, 11 May 2021 11:59:13 GMT
- Title: Composable Energy Policies for Reactive Motion Generation and
Reinforcement Learning
- Authors: Julen Urain, Anqi Li, Puze Liu, Carlo D'Eramo, Jan Peters
- Abstract summary: We introduce Composable Energy Policies (CEP), a novel framework for modular motion generation.
CEP computes the control action by optimization over the product of a set of reactive policies.
CEP naturally adapts to the Reinforcement Learning problem allowing us to integrate, in a hierarchical fashion, any distribution as prior.
- Score: 25.498555742173323
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reactive motion generation problems are usually solved by computing actions
as a sum of policies. However, these policies are independent of each other and
thus, they can have conflicting behaviors when summing their contributions
together. We introduce Composable Energy Policies (CEP), a novel framework for
modular reactive motion generation. CEP computes the control action by
optimization over the product of a set of stochastic policies. This product of
policies will provide a high probability to those actions that satisfy all the
components and low probability to the others. Optimizing over the product of
the policies avoids the detrimental effect of conflicting behaviors between
policies choosing an action that satisfies all the objectives. Besides, we show
that CEP naturally adapts to the Reinforcement Learning problem allowing us to
integrate, in a hierarchical fashion, any distribution as prior, from
multimodal distributions to non-smooth distributions and learn a new policy
given them.
Related papers
- Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation [1.079960007119637]
OPS-DeMo is an online algorithm that employs dynamic error decay to detect changes in opponents' policies.
Our approach outperforms PPO-trained models in dynamic scenarios like the Predator-Prey setting.
arXiv Detail & Related papers (2024-06-10T17:34:44Z) - Constraint-Generation Policy Optimization (CGPO): Nonlinear Programming
for Policy Optimization in Mixed Discrete-Continuous MDPs [23.87856533426793]
CGPO provides bounded policy error guarantees over an infinite range of initial states for many DC-MDPs with expressive nonlinear dynamics.
CGPO can generate worst-case state trajectories to diagnose policy deficiencies and provide counterfactual explanations of optimal actions.
We experimentally demonstrate the applicability of CGPO in diverse domains, including inventory control, management of a system of water reservoirs.
arXiv Detail & Related papers (2024-01-20T07:12:57Z) - Personalized Reinforcement Learning with a Budget of Policies [9.846353643883443]
Personalization in machine learning (ML) tailors models' decisions to the individual characteristics of users.
We propose a novel framework termed represented Markov Decision Processes (r-MDPs) that is designed to balance the need for personalization with the regulatory constraints.
In an r-MDP, we cater to a diverse user population, each with unique preferences, through interaction with a small set of representative policies.
We develop two deep reinforcement learning algorithms that efficiently solve r-MDPs.
arXiv Detail & Related papers (2024-01-12T11:27:55Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees [8.610425739792284]
We revisit the domain of off-policy policy optimization in RL.
One commonly-used approach is to leverage the off-policy policy gradient to optimize a surrogate objective.
This approach has been shown to suffer from the distribution mismatch issue.
arXiv Detail & Related papers (2022-12-10T07:47:04Z) - CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies [62.39667564455059]
We consider and study a distribution of optimal policies.
In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems.
We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability.
arXiv Detail & Related papers (2022-05-19T09:48:56Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - DisCo RL: Distribution-Conditioned Reinforcement Learning for
General-Purpose Policies [116.12670064963625]
We develop an off-policy algorithm called distribution-conditioned reinforcement learning (DisCo RL) to efficiently learn contextual policies.
We evaluate DisCo RL on a variety of robot manipulation tasks and find that it significantly outperforms prior methods on tasks that require generalization to new goal distributions.
arXiv Detail & Related papers (2021-04-23T16:51:58Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents.
We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively.
We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.