Teaching a Robot to Walk Using Reinforcement Learning
- URL: http://arxiv.org/abs/2112.07031v1
- Date: Mon, 13 Dec 2021 21:35:45 GMT
- Title: Teaching a Robot to Walk Using Reinforcement Learning
- Authors: Jack Dibachi and Jacob Azoulay
- Abstract summary: reinforcement learning can train optimal walking policies with ease.
We teach a simulated two-dimensional bipedal robot how to walk using the OpenAI Gym BipedalWalker-v3 environment.
ARS resulted in a better trained robot, and produced an optimal policy which officially "solves" the BipedalWalker-v3 problem.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classical control techniques such as PID and LQR have been used effectively
in maintaining a system state, but these techniques become more difficult to
implement when the model dynamics increase in complexity and sensitivity. For
adaptive robotic locomotion tasks with several degrees of freedom, this task
becomes infeasible with classical control techniques. Instead, reinforcement
learning can train optimal walking policies with ease. We apply deep Q-learning
and augmented random search (ARS) to teach a simulated two-dimensional bipedal
robot how to walk using the OpenAI Gym BipedalWalker-v3 environment. Deep
Q-learning did not yield a high reward policy, often prematurely converging to
suboptimal local maxima likely due to the coarsely discretized action space.
ARS, however, resulted in a better trained robot, and produced an optimal
policy which officially "solves" the BipedalWalker-v3 problem. Various naive
policies, including a random policy, a manually encoded inch forward policy,
and a stay still policy, were used as benchmarks to evaluate the proficiency of
the learning algorithm results.
Related papers
- Robotic Control via Embodied Chain-of-Thought Reasoning [86.6680905262442]
Key limitation of learned robot control policies is their inability to generalize outside their training data.
Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models can substantially improve their robustness and generalization ability.
We introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features before predicting the robot action.
arXiv Detail & Related papers (2024-07-11T17:31:01Z) - Quality-Diversity Optimisation on a Physical Robot Through
Dynamics-Aware and Reset-Free Learning [4.260312058817663]
We build upon the Reset-Free QD (RF-QD) algorithm to learn controllers directly on a physical robot.
This method uses a dynamics model, learned from interactions between the robot and the environment, to predict the robot's behaviour.
RF-QD also includes a recovery policy that returns the robot to a safe zone when it has walked outside of it, allowing continuous learning.
arXiv Detail & Related papers (2023-04-24T13:24:00Z) - Domain Randomization for Robust, Affordable and Effective Closed-loop
Control of Soft Robots [10.977130974626668]
Soft robots are gaining popularity thanks to their intrinsic safety to contacts and adaptability.
We show how Domain Randomization (DR) can solve this problem by enhancing RL policies for soft robots.
We introduce a novel algorithmic extension to previous adaptive domain randomization methods for the automatic inference of dynamics parameters for deformable objects.
arXiv Detail & Related papers (2023-03-07T18:50:00Z) - Verifying Learning-Based Robotic Navigation Systems [61.01217374879221]
We show how modern verification engines can be used for effective model selection.
Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior.
Our work is the first to demonstrate the use of verification backends for recognizing suboptimal DRL policies in real-world robots.
arXiv Detail & Related papers (2022-05-26T17:56:43Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Reinforcement Learning with Adaptive Curriculum Dynamics Randomization
for Fault-Tolerant Robot Control [4.9631159466100305]
The ACDR algorithm can adaptively train a quadruped robot in random actuator failure conditions.
The ACDR algorithm can be used to build a robot system that does not require additional modules for detecting actuator failures.
arXiv Detail & Related papers (2021-11-19T01:55:57Z) - XAI-N: Sensor-based Robot Navigation using Expert Policies and Decision
Trees [55.9643422180256]
We present a novel sensor-based learning navigation algorithm to compute a collision-free trajectory for a robot in dense and dynamic environments.
Our approach uses deep reinforcement learning-based expert policy that is trained using a sim2real paradigm.
We highlight the benefits of our algorithm in simulated environments and navigating a Clearpath Jackal robot among moving pedestrians.
arXiv Detail & Related papers (2021-04-22T01:33:10Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.