Preference-Based Learning for User-Guided HZD Gait Generation on Bipedal
Walking Robots
- URL: http://arxiv.org/abs/2011.05424v2
- Date: Mon, 29 Mar 2021 18:31:09 GMT
- Title: Preference-Based Learning for User-Guided HZD Gait Generation on Bipedal
Walking Robots
- Authors: Maegan Tucker, Noel Csomay-Shanklin, Wen-Loong Ma, and Aaron D. Ames
- Abstract summary: This paper presents a framework that leverages both control theory and machine learning to obtain stable and robust bipedal locomotion.
Results show that the framework achieves stable, robust, efficient, and natural walking in fewer than 50 iterations with no reliance on a simulation environment.
- Score: 31.994815173888806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a framework that leverages both control theory and
machine learning to obtain stable and robust bipedal locomotion without the
need for manual parameter tuning. Traditionally, gaits are generated through
trajectory optimization methods and then realized experimentally -- a process
that often requires extensive tuning due to differences between the models and
hardware. In this work, the process of gait realization via hybrid zero
dynamics (HZD) based optimization is formally combined with preference-based
learning to systematically realize dynamically stable walking. Importantly,
this learning approach does not require a carefully constructed reward
function, but instead utilizes human pairwise preferences. The power of the
proposed approach is demonstrated through two experiments on a planar biped
AMBER-3M: the first with rigid point-feet, and the second with induced model
uncertainty through the addition of springs where the added compliance was not
accounted for in the gait generation or in the controller. In both experiments,
the framework achieves stable, robust, efficient, and natural walking in fewer
than 50 iterations with no reliance on a simulation environment. These results
demonstrate a promising step in the unification of control theory and learning.
Related papers
- Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - DTC: Deep Tracking Control [16.2850135844455]
We propose a hybrid control architecture that combines the advantages of both worlds to achieve greater robustness, foot-placement accuracy, and terrain generalization.
A deep neural network policy is trained in simulation, aiming to track the optimized footholds.
We demonstrate superior robustness in the presence of slippery or deformable ground when compared to model-based counterparts.
arXiv Detail & Related papers (2023-09-27T07:57:37Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Combining model-predictive control and predictive reinforcement learning
for stable quadrupedal robot locomotion [0.0]
We study how this can be achieved by a combination of model-predictive and predictive reinforcement learning controllers.
In this work, we combine both control methods to address the quadrupedal robot stable gate generation problem.
arXiv Detail & Related papers (2023-07-15T09:22:37Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Active Learning of Discrete-Time Dynamics for Uncertainty-Aware Model Predictive Control [46.81433026280051]
We present a self-supervised learning approach that actively models the dynamics of nonlinear robotic systems.
Our approach showcases high resilience and generalization capabilities by consistently adapting to unseen flight conditions.
arXiv Detail & Related papers (2022-10-23T00:45:05Z) - Adaptive Model Predictive Control by Learning Classifiers [26.052368583196426]
We propose an adaptive MPC variant that automatically estimates control and model parameters.
We leverage recent results showing that BO can be formulated as a density ratio estimation.
This is then integrated into a model predictive path integral control framework yielding robust controllers for a variety of challenging robotics tasks.
arXiv Detail & Related papers (2022-03-13T23:22:12Z) - Bayesian Optimization Meets Hybrid Zero Dynamics: Safe Parameter
Learning for Bipedal Locomotion Control [17.37169551675587]
We propose a multi-domain control parameter learning framework for locomotion control of bipedal robots.
We leverage BO to learn the control parameters used in the HZD-based controller.
Next, the learning process is applied on the physical robot to learn for corrections to the control parameters learned in simulation.
arXiv Detail & Related papers (2022-03-04T20:48:17Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - First Steps: Latent-Space Control with Semantic Constraints for
Quadruped Locomotion [73.37945453998134]
Traditional approaches to quadruped control employ simplified, hand-derived models.
This significantly reduces the capability of the robot since its effective kinematic range is curtailed.
In this work, these challenges are addressed by framing quadruped control as optimisation in a structured latent space.
A deep generative model captures a statistical representation of feasible joint configurations, whilst complex dynamic and terminal constraints are expressed via high-level, semantic indicators.
We validate the feasibility of locomotion trajectories optimised using our approach both in simulation and on a real-worldmal quadruped.
arXiv Detail & Related papers (2020-07-03T07:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.