Reinforcement Learning Control of Robotic Knee with Human in the Loop by
Flexible Policy Iteration
- URL: http://arxiv.org/abs/2006.09008v2
- Date: Sun, 17 Jan 2021 11:58:50 GMT
- Title: Reinforcement Learning Control of Robotic Knee with Human in the Loop by
Flexible Policy Iteration
- Authors: Xiang Gao, Jennie Si, Yue Wen, Minhan Li and He (Helen) Huang
- Abstract summary: This study fills important voids by introducing innovative features to the policy algorithm.
We show system level performances including convergence of the approximate value function, (sub)optimality of the solution, and stability of the system.
- Score: 17.365135977882215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We are motivated by the real challenges presented in a human-robot system to
develop new designs that are efficient at data level and with performance
guarantees such as stability and optimality at systems level. Existing
approximate/adaptive dynamic programming (ADP) results that consider system
performance theoretically are not readily providing practically useful learning
control algorithms for this problem; and reinforcement learning (RL) algorithms
that address the issue of data efficiency usually do not have performance
guarantees for the controlled system. This study fills these important voids by
introducing innovative features to the policy iteration algorithm. We introduce
flexible policy iteration (FPI), which can flexibly and organically integrate
experience replay and supplemental values from prior experience into the RL
controller. We show system level performances including convergence of the
approximate value function, (sub)optimality of the solution, and stability of
the system. We demonstrate the effectiveness of the FPI via realistic
simulations of the human-robot system. It is noted that the problem we face in
this study may be difficult to address by design methods based on classical
control theory as it is nearly impossible to obtain a customized mathematical
model of a human-robot system either online or offline. The results we have
obtained also indicate the great potential of RL control to solving realistic
and challenging problems with high dimensional control inputs.
Related papers
- Online Control-Informed Learning [4.907545537403502]
This paper proposes an Online Control-Informed Learning framework to solve a broad class of learning and control tasks in real time.
By considering any robot as a tunable optimal control system, we propose an online parameter estimator based on extended Kalman filter (EKF)
The proposed method also improves robustness in learning by effectively managing noise in the data.
arXiv Detail & Related papers (2024-10-04T21:03:16Z) - Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Learning to Boost the Performance of Stable Nonlinear Systems [0.0]
We tackle the performance-boosting problem with closed-loop stability guarantees.
Our methods enable learning over arbitrarily deep neural network classes of performance-boosting controllers for stable nonlinear systems.
arXiv Detail & Related papers (2024-05-01T21:11:29Z) - Active Learning for Control-Oriented Identification of Nonlinear Systems [26.231260751633307]
We present the first finite sample analysis of an active learning algorithm suitable for a general class of nonlinear dynamics.
In certain settings, the excess control cost of our algorithm achieves the optimal rate, up to logarithmic factors.
We validate our approach in simulation, showcasing the advantage of active, control-oriented exploration for controlling nonlinear systems.
arXiv Detail & Related papers (2024-04-13T15:40:39Z) - SERL: A Software Suite for Sample-Efficient Robotic Reinforcement
Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment.
We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation.
These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z) - Optimal Exploration for Model-Based RL in Nonlinear Systems [14.540210895533937]
Learning to control unknown nonlinear dynamical systems is a fundamental problem in reinforcement learning and control theory.
We develop an algorithm able to efficiently explore the system to reduce uncertainty in a task-dependent metric.
Our algorithm relies on a general reduction from policy optimization to optimal experiment design in arbitrary systems, and may be of independent interest.
arXiv Detail & Related papers (2023-06-15T15:47:50Z) - On Robust Numerical Solver for ODE via Self-Attention Mechanism [82.95493796476767]
We explore training efficient and robust AI-enhanced numerical solvers with a small data size by mitigating intrinsic noise disturbances.
We first analyze the ability of the self-attention mechanism to regulate noise in supervised learning and then propose a simple-yet-effective numerical solver, Attr, which introduces an additive self-attention mechanism to the numerical solution of differential equations.
arXiv Detail & Related papers (2023-02-05T01:39:21Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Anticipating the Long-Term Effect of Online Learning in Control [75.6527644813815]
AntLer is a design algorithm for learning-based control laws that anticipates learning.
We show that AntLer approximates an optimal solution arbitrarily accurately with probability one.
arXiv Detail & Related papers (2020-07-24T07:00:14Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.