Learning Coordinated Terrain-Adaptive Locomotion by Imitating a
Centroidal Dynamics Planner
- URL: http://arxiv.org/abs/2111.00262v1
- Date: Sat, 30 Oct 2021 14:24:39 GMT
- Title: Learning Coordinated Terrain-Adaptive Locomotion by Imitating a
Centroidal Dynamics Planner
- Authors: Philemon Brakel, Steven Bohez, Leonard Hasenclever, Nicolas Heess,
Konstantinos Bousmalis
- Abstract summary: Reinforcement Learning (RL) can learn dynamic reactive controllers but require carefully tuned shaping rewards to produce good gaits.
Imitation learning circumvents this problem and has been used with motion capture data to extract quadruped gaits for flat terrains.
We show that the learned policies transfer to unseen terrains and can be fine-tuned to dynamically traverse challenging terrains.
- Score: 27.476911967228926
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic quadruped locomotion over challenging terrains with precise foot
placements is a hard problem for both optimal control methods and Reinforcement
Learning (RL). Non-linear solvers can produce coordinated constraint satisfying
motions, but often take too long to converge for online application. RL methods
can learn dynamic reactive controllers but require carefully tuned shaping
rewards to produce good gaits and can have trouble discovering precise
coordinated movements. Imitation learning circumvents this problem and has been
used with motion capture data to extract quadruped gaits for flat terrains.
However, it would be costly to acquire motion capture data for a very large
variety of terrains with height differences. In this work, we combine the
advantages of trajectory optimization and learning methods and show that
terrain adaptive controllers can be obtained by training policies to imitate
trajectories that have been planned over procedural terrains by a non-linear
solver. We show that the learned policies transfer to unseen terrains and can
be fine-tuned to dynamically traverse challenging terrains that require precise
foot placements and are very hard to solve with standard RL.
Related papers
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control [62.24301794794304]
Deep Adaptive Trajectory Tracking (DATT) is a learning-based approach that can precisely track arbitrary, potentially infeasible trajectories in the presence of large disturbances in the real world.
DATT significantly outperforms competitive adaptive nonlinear and model predictive controllers for both feasible smooth and infeasible trajectories in unsteady wind fields.
It can efficiently run online with an inference time less than 3.2 ms, less than 1/4 of the adaptive nonlinear model predictive control baseline.
arXiv Detail & Related papers (2023-10-13T12:22:31Z) - DTC: Deep Tracking Control [16.2850135844455]
We propose a hybrid control architecture that combines the advantages of both worlds to achieve greater robustness, foot-placement accuracy, and terrain generalization.
A deep neural network policy is trained in simulation, aiming to track the optimized footholds.
We demonstrate superior robustness in the presence of slippery or deformable ground when compared to model-based counterparts.
arXiv Detail & Related papers (2023-09-27T07:57:37Z) - Learning and Adapting Agile Locomotion Skills by Transferring Experience [71.8926510772552]
We propose a framework for training complex robotic skills by transferring experience from existing controllers to jumpstart learning new tasks.
We show that our method enables learning complex agile jumping behaviors, navigating to goal locations while walking on hind legs, and adapting to new environments.
arXiv Detail & Related papers (2023-04-19T17:37:54Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - Learning to Jump from Pixels [23.17535989519855]
We present Depth-based Impulse Control (DIC), a method for synthesizing highly agile visually-guided behaviors.
DIC affords the flexibility of model-free learning but regularizes behavior through explicit model-based optimization of ground reaction forces.
We evaluate the proposed method both in simulation and in the real world.
arXiv Detail & Related papers (2021-10-28T17:53:06Z) - RLOC: Terrain-Aware Legged Locomotion using Reinforcement Learning and
Optimal Control [6.669503016190925]
We present a unified model-based and data-driven approach for quadrupedal planning and control.
We map sensory information and desired base velocity commands into footstep plans using a reinforcement learning policy.
We train and evaluate our framework on a complex quadrupedal system, ANYmal B, and demonstrate transferability to a larger and heavier robot, ANYmal C, without requiring retraining.
arXiv Detail & Related papers (2020-12-05T18:30:23Z) - Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy
Approach [3.752600874088677]
We use a linear policy for realizing end-foot trajectories in the quadruped robot, Stoch $2$.
In particular, the parameters of the end-foot trajectories are shaped via a linear feedback policy that takes the torso orientation and the terrain slope as inputs.
The resulting walking is robust to terrain slope variations and external pushes.
arXiv Detail & Related papers (2020-10-30T16:02:08Z) - ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for
Mobile Manipulation [99.2543521972137]
ReLMoGen is a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals.
Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments.
ReLMoGen shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.
arXiv Detail & Related papers (2020-08-18T08:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.