Hierarchical Policy Blending as Inference for Reactive Robot Control
- URL: http://arxiv.org/abs/2210.07890v3
- Date: Mon, 29 Jul 2024 07:13:42 GMT
- Title: Hierarchical Policy Blending as Inference for Reactive Robot Control
- Authors: Kay Hansel, Julen Urain, Jan Peters, Georgia Chalvatzaki,
- Abstract summary: Motion generation in cluttered, dense, and dynamic environments is a central topic in robotics.
We propose a hierarchical motion generation method that combines the benefits of reactive policies and planning.
Our experimental study in planar navigation and 6DoF manipulation shows that our proposed hierarchical motion generation method outperforms both myopic reactive controllers and online re-planning methods.
- Score: 21.058662668187875
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Motion generation in cluttered, dense, and dynamic environments is a central topic in robotics, rendered as a multi-objective decision-making problem. Current approaches trade-off between safety and performance. On the one hand, reactive policies guarantee fast response to environmental changes at the risk of suboptimal behavior. On the other hand, planning-based motion generation provides feasible trajectories, but the high computational cost may limit the control frequency and thus safety. To combine the benefits of reactive policies and planning, we propose a hierarchical motion generation method. Moreover, we adopt probabilistic inference methods to formalize the hierarchical model and stochastic optimization. We realize this approach as a weighted product of stochastic, reactive expert policies, where planning is used to adaptively compute the optimal weights over the task horizon. This stochastic optimization avoids local optima and proposes feasible reactive plans that find paths in cluttered and dense environments. Our extensive experimental study in planar navigation and 6DoF manipulation shows that our proposed hierarchical motion generation method outperforms both myopic reactive controllers and online re-planning methods.
Related papers
- Extremum-Seeking Action Selection for Accelerating Policy Optimization [18.162794442835413]
Reinforcement learning for control over continuous spaces typically uses high-entropy policies, such as Gaussian distributions, for local exploration and estimating policy to optimize performance.
We propose to improve action selection in this model-free RL setting by introducing additional adaptive control steps based on Extremum-Seeking Control (ESC)
Our methods can be easily added in standard policy optimization to improve learning efficiency, which we demonstrate in various control learning environments.
arXiv Detail & Related papers (2024-04-02T02:39:17Z) - A Unifying Variational Framework for Gaussian Process Motion Planning [44.332875416815384]
We introduce a framework for robot motion planning based on variational Gaussian processes.
Our framework provides a principled and flexible way to incorporate equality-based, inequality-based, and soft motion-planning constraints.
Results show that our proposed approach yields a good balance between success rates and path quality.
arXiv Detail & Related papers (2023-09-02T07:51:29Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Diverse Policy Optimization for Structured Action Space [59.361076277997704]
We propose Diverse Policy Optimization (DPO) to model the policies in structured action space as the energy-based models (EBM)
A novel and powerful generative model, GFlowNet, is introduced as the efficient, diverse EBM-based policy sampler.
Experiments on ATSC and Battle benchmarks demonstrate that DPO can efficiently discover surprisingly diverse policies.
arXiv Detail & Related papers (2023-02-23T10:48:09Z) - Non-Parametric Stochastic Policy Gradient with Strategic Retreat for
Non-Stationary Environment [1.5229257192293197]
We propose a systematic methodology to learn a sequence of optimal control policies non-parametrically.
Our methodology has outperformed the well-established DDPG and TD3 methodology by a sizeable margin in terms of learning performance.
arXiv Detail & Related papers (2022-03-24T21:41:13Z) - RAPTOR: End-to-end Risk-Aware MDP Planning and Policy Learning by
Backpropagation [12.600828753197204]
We introduce Risk-Aware Planning using PyTorch (RAP), a novel framework for risk-sensitive planning through end-to-end optimization of the entropic utility objective.
We evaluate and compare these two forms of RAPTOR on three highly do-mains, including nonlinear navigation, HVAC control, and linear reservoir control.
arXiv Detail & Related papers (2021-06-14T09:27:19Z) - Risk-Sensitive Sequential Action Control with Multi-Modal Human
Trajectory Forecasting for Safe Crowd-Robot Interaction [55.569050872780224]
We present an online framework for safe crowd-robot interaction based on risk-sensitive optimal control, wherein the risk is modeled by the entropic risk measure.
Our modular approach decouples the crowd-robot interaction into learning-based prediction and model-based control.
A simulation study and a real-world experiment show that the proposed framework can accomplish safe and efficient navigation while avoiding collisions with more than 50 humans in the scene.
arXiv Detail & Related papers (2020-09-12T02:02:52Z) - Reinforcement Learning for Low-Thrust Trajectory Design of
Interplanetary Missions [77.34726150561087]
This paper investigates the use of reinforcement learning for the robust design of interplanetary trajectories in presence of severe disturbances.
An open-source implementation of the state-of-the-art algorithm Proximal Policy Optimization is adopted.
The resulting Guidance and Control Network provides both a robust nominal trajectory and the associated closed-loop guidance law.
arXiv Detail & Related papers (2020-08-19T15:22:15Z) - Jump Operator Planning: Goal-Conditioned Policy Ensembles and Zero-Shot
Transfer [71.44215606325005]
We propose a novel framework called Jump-Operator Dynamic Programming for quickly computing solutions within a super-exponential space of sequential sub-goal tasks.
This approach involves controlling over an ensemble of reusable goal-conditioned polices functioning as temporally extended actions.
We then identify classes of objective functions on this subspace whose solutions are invariant to the grounding, resulting in optimal zero-shot transfer.
arXiv Detail & Related papers (2020-07-06T05:13:20Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z) - Path Planning in Dynamic Environments using Generative RNNs and Monte
Carlo Tree Search [11.412720572948086]
State of the art methods for robotic path planning in dynamic environments, such as crowds or traffic, rely on hand crafted motion models for agents.
This paper proposes an integrated path planning framework using generative Recurrent Neural Networks within a Monte Carlo Tree Search (MCTS)
We show that the proposed framework can considerably improve motion prediction accuracy during interactions.
arXiv Detail & Related papers (2020-01-30T22:46:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.