Related papers: Hierarchical Policy Blending as Inference for Reactive Robot Control

Hierarchical Policy Blending as Inference for Reactive Robot Control

URL: http://arxiv.org/abs/2210.07890v3
Date: Mon, 29 Jul 2024 07:13:42 GMT
Title: Hierarchical Policy Blending as Inference for Reactive Robot Control
Authors: Kay Hansel, Julen Urain, Jan Peters, Georgia Chalvatzaki,
Abstract summary: Motion generation in cluttered, dense, and dynamic environments is a central topic in robotics. We propose a hierarchical motion generation method that combines the benefits of reactive policies and planning. Our experimental study in planar navigation and 6DoF manipulation shows that our proposed hierarchical motion generation method outperforms both myopic reactive controllers and online re-planning methods.
Score: 21.058662668187875
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Motion generation in cluttered, dense, and dynamic environments is a central topic in robotics, rendered as a multi-objective decision-making problem. Current approaches trade-off between safety and performance. On the one hand, reactive policies guarantee fast response to environmental changes at the risk of suboptimal behavior. On the other hand, planning-based motion generation provides feasible trajectories, but the high computational cost may limit the control frequency and thus safety. To combine the benefits of reactive policies and planning, we propose a hierarchical motion generation method. Moreover, we adopt probabilistic inference methods to formalize the hierarchical model and stochastic optimization. We realize this approach as a weighted product of stochastic, reactive expert policies, where planning is used to adaptively compute the optimal weights over the task horizon. This stochastic optimization avoids local optima and proposes feasible reactive plans that find paths in cluttered and dense environments. Our extensive experimental study in planar navigation and 6DoF manipulation shows that our proposed hierarchical motion generation method outperforms both myopic reactive controllers and online re-planning methods.

Related papers

Global-Decision-Focused Neural ODEs for Proactive Grid Resilience Management [50.34345101758248]
We propose predict-all-then-optimize-globally (PATOG), a framework that integrates outage prediction with globally optimized interventions. Our approach ensures spatially and temporally coherent decision-making, improving both predictive accuracy and operational efficiency. Experiments on synthetic and real-world datasets demonstrate significant improvements in outage prediction consistency and grid resilience.
arXiv Detail & Related papers (2025-02-25T16:15:35Z)
Monte Carlo Tree Search with Velocity Obstacles for safe and efficient motion planning in dynamic environments [49.30744329170107]
We propose a novel approach for optimal online motion planning with minimal information about dynamic obstacles. The proposed methodology combines Monte Carlo Tree Search (MCTS), for online optimal planning via model simulations, with Velocity Obstacles (VO), for obstacle avoidance. We show the superiority of our methodology with respect to state-of-the-art planners, including Non-linear Model Predictive Control (NMPC), in terms of improved collision rate, computational and task performance.
arXiv Detail & Related papers (2025-01-16T16:45:08Z)
Learning Maximal Safe Sets Using Hypernetworks for MPC-based Local Trajectory Planning in Unknown Environments [1.3182466374784207]
This paper presents a novel learning-based approach for online estimation of maximal safe sets for local trajectory planning in unknown static environments. The neural representation of a set is used as the terminal set constraint for a model predictive control (MPC) local planner. We deploy our proposed method, NTC-MPC, on a physical robot and demonstrate its ability to safely avoid obstacles in scenarios where the baselines fail.
arXiv Detail & Related papers (2024-10-26T20:37:57Z)
Extremum-Seeking Action Selection for Accelerating Policy Optimization [18.162794442835413]
Reinforcement learning for control over continuous spaces typically uses high-entropy policies, such as Gaussian distributions, for local exploration and estimating policy to optimize performance. We propose to improve action selection in this model-free RL setting by introducing additional adaptive control steps based on Extremum-Seeking Control (ESC) Our methods can be easily added in standard policy optimization to improve learning efficiency, which we demonstrate in various control learning environments.
arXiv Detail & Related papers (2024-04-02T02:39:17Z)
A Unifying Variational Framework for Gaussian Process Motion Planning [44.332875416815384]
We introduce a framework for robot motion planning based on variational Gaussian processes. Our framework provides a principled and flexible way to incorporate equality-based, inequality-based, and soft motion-planning constraints. Results show that our proposed approach yields a good balance between success rates and path quality.
arXiv Detail & Related papers (2023-09-02T07:51:29Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Diverse Policy Optimization for Structured Action Space [59.361076277997704]
We propose Diverse Policy Optimization (DPO) to model the policies in structured action space as the energy-based models (EBM) A novel and powerful generative model, GFlowNet, is introduced as the efficient, diverse EBM-based policy sampler. Experiments on ATSC and Battle benchmarks demonstrate that DPO can efficiently discover surprisingly diverse policies.
arXiv Detail & Related papers (2023-02-23T10:48:09Z)
Non-Parametric Stochastic Policy Gradient with Strategic Retreat for Non-Stationary Environment [1.5229257192293197]
We propose a systematic methodology to learn a sequence of optimal control policies non-parametrically. Our methodology has outperformed the well-established DDPG and TD3 methodology by a sizeable margin in terms of learning performance.
arXiv Detail & Related papers (2022-03-24T21:41:13Z)
RAPTOR: End-to-end Risk-Aware MDP Planning and Policy Learning by Backpropagation [12.600828753197204]
We introduce Risk-Aware Planning using PyTorch (RAP), a novel framework for risk-sensitive planning through end-to-end optimization of the entropic utility objective. We evaluate and compare these two forms of RAPTOR on three highly do-mains, including nonlinear navigation, HVAC control, and linear reservoir control.
arXiv Detail & Related papers (2021-06-14T09:27:19Z)
Risk-Sensitive Sequential Action Control with Multi-Modal Human Trajectory Forecasting for Safe Crowd-Robot Interaction [55.569050872780224]
We present an online framework for safe crowd-robot interaction based on risk-sensitive optimal control, wherein the risk is modeled by the entropic risk measure. Our modular approach decouples the crowd-robot interaction into learning-based prediction and model-based control. A simulation study and a real-world experiment show that the proposed framework can accomplish safe and efficient navigation while avoiding collisions with more than 50 humans in the scene.
arXiv Detail & Related papers (2020-09-12T02:02:52Z)
Reinforcement Learning for Low-Thrust Trajectory Design of Interplanetary Missions [77.34726150561087]
This paper investigates the use of reinforcement learning for the robust design of interplanetary trajectories in presence of severe disturbances. An open-source implementation of the state-of-the-art algorithm Proximal Policy Optimization is adopted. The resulting Guidance and Control Network provides both a robust nominal trajectory and the associated closed-loop guidance law.
arXiv Detail & Related papers (2020-08-19T15:22:15Z)
Jump Operator Planning: Goal-Conditioned Policy Ensembles and Zero-Shot Transfer [71.44215606325005]
We propose a novel framework called Jump-Operator Dynamic Programming for quickly computing solutions within a super-exponential space of sequential sub-goal tasks. This approach involves controlling over an ensemble of reusable goal-conditioned polices functioning as temporally extended actions. We then identify classes of objective functions on this subspace whose solutions are invariant to the grounding, resulting in optimal zero-shot transfer.
arXiv Detail & Related papers (2020-07-06T05:13:20Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
Path Planning in Dynamic Environments using Generative RNNs and Monte Carlo Tree Search [11.412720572948086]
State of the art methods for robotic path planning in dynamic environments, such as crowds or traffic, rely on hand crafted motion models for agents. This paper proposes an integrated path planning framework using generative Recurrent Neural Networks within a Monte Carlo Tree Search (MCTS) We show that the proposed framework can considerably improve motion prediction accuracy during interactions.
arXiv Detail & Related papers (2020-01-30T22:46:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.