Tracking Control for a Spherical Pendulum via Curriculum Reinforcement
Learning
- URL: http://arxiv.org/abs/2309.14096v1
- Date: Mon, 25 Sep 2023 12:48:47 GMT
- Title: Tracking Control for a Spherical Pendulum via Curriculum Reinforcement
Learning
- Authors: Pascal Klink, Florian Wolf, Kai Ploeger, Jan Peters and Joni Pajarinen
- Abstract summary: Reinforcement Learning (RL) allows learning non-trivial robot control laws purely from data.
In this paper, we pair a recent algorithm for automatically building curricula with RL on massively parallelized simulations.
We demonstrate the potential of curriculum RL to jointly learn state estimation and control for non-linear tracking tasks.
- Score: 27.73555826776087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) allows learning non-trivial robot control laws
purely from data. However, many successful applications of RL have relied on
ad-hoc regularizations, such as hand-crafted curricula, to regularize the
learning performance. In this paper, we pair a recent algorithm for
automatically building curricula with RL on massively parallelized simulations
to learn a tracking controller for a spherical pendulum on a robotic arm via
RL. Through an improved optimization scheme that better respects the
non-Euclidean task structure, we allow the method to reliably generate
curricula of trajectories to be tracked, resulting in faster and more robust
learning compared to an RL baseline that does not exploit this form of
structured learning. The learned policy matches the performance of an optimal
control baseline on the real system, demonstrating the potential of curriculum
RL to jointly learn state estimation and control for non-linear tracking tasks.
Related papers
- Online Control-Informed Learning [4.907545537403502]
This paper proposes an Online Control-Informed Learning framework to solve a broad class of learning and control tasks in real time.
By considering any robot as a tunable optimal control system, we propose an online parameter estimator based on extended Kalman filter (EKF)
The proposed method also improves robustness in learning by effectively managing noise in the data.
arXiv Detail & Related papers (2024-10-04T21:03:16Z) - Implicit Offline Reinforcement Learning via Supervised Learning [83.8241505499762]
Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels.
We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets.
arXiv Detail & Related papers (2022-10-21T21:59:42Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction.
We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems.
We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z) - Reinforcement Learning for Control of Valves [0.0]
This paper is a study of reinforcement learning (RL) as an optimal-control strategy for control of nonlinear valves.
It is evaluated against the PID (proportional-integral-derivative) strategy, using a unified framework.
arXiv Detail & Related papers (2020-12-29T09:01:47Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z) - Self-Paced Deep Reinforcement Learning [42.467323141301826]
Curriculum reinforcement learning (CRL) improves the learning speed and stability of an agent by exposing it to a tailored series of tasks throughout learning.
Despite empirical successes, an open question in CRL is how to automatically generate a curriculum for a given reinforcement learning (RL) agent, avoiding manual design.
We propose an answer by interpreting the curriculum generation as an inference problem, where distributions over tasks are progressively learned to approach the target task.
This approach leads to an automatic curriculum generation, whose pace is controlled by the agent, with solid theoretical motivation and easily integrated with deep RL algorithms.
arXiv Detail & Related papers (2020-04-24T15:48:07Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.