Related papers: Reinforcement Learning Control of Robotic Knee with Human in the Loop by Flexible Policy Iteration

Reinforcement Learning Control of Robotic Knee with Human in the Loop by Flexible Policy Iteration

URL: http://arxiv.org/abs/2006.09008v2
Date: Sun, 17 Jan 2021 11:58:50 GMT
Title: Reinforcement Learning Control of Robotic Knee with Human in the Loop by Flexible Policy Iteration
Authors: Xiang Gao, Jennie Si, Yue Wen, Minhan Li and He (Helen) Huang
Abstract summary: This study fills important voids by introducing innovative features to the policy algorithm. We show system level performances including convergence of the approximate value function, (sub)optimality of the solution, and stability of the system.
Score: 17.365135977882215
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We are motivated by the real challenges presented in a human-robot system to develop new designs that are efficient at data level and with performance guarantees such as stability and optimality at systems level. Existing approximate/adaptive dynamic programming (ADP) results that consider system performance theoretically are not readily providing practically useful learning control algorithms for this problem; and reinforcement learning (RL) algorithms that address the issue of data efficiency usually do not have performance guarantees for the controlled system. This study fills these important voids by introducing innovative features to the policy iteration algorithm. We introduce flexible policy iteration (FPI), which can flexibly and organically integrate experience replay and supplemental values from prior experience into the RL controller. We show system level performances including convergence of the approximate value function, (sub)optimality of the solution, and stability of the system. We demonstrate the effectiveness of the FPI via realistic simulations of the human-robot system. It is noted that the problem we face in this study may be difficult to address by design methods based on classical control theory as it is nearly impossible to obtain a customized mathematical model of a human-robot system either online or offline. The results we have obtained also indicate the great potential of RL control to solving realistic and challenging problems with high dimensional control inputs.

Related papers

Cost-effective Reduced-Order Modeling via Bayesian Active Learning [12.256032958843065]
We propose BayPOD-AL, an active learning framework based on an uncertainty-aware Bayesian proper decomposition (POD) approach.<n> Experimental results on predicting the temperature evolution over a rod demonstrate BayPOD-AL's effectiveness in suggesting the informative data.<n>We demonstrate BayPOD-AL's generalizability and efficiency by evaluating its performance on a dataset of higher temporal resolution than the training dataset.
arXiv Detail & Related papers (2025-06-27T21:23:37Z)
Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator [50.191655141020505]
Reinforcement Learning (RL) has demonstrated impressive capabilities in robotic control but remains challenging due to high sample complexity, safety concerns, and the sim-to-real gap. We introduce Offline Robotic World Model (RWM-O), a model-based approach that explicitly estimates uncertainty to improve policy learning without reliance on a physics simulator.
arXiv Detail & Related papers (2025-04-23T12:58:15Z)
Online Control-Informed Learning [4.907545537403502]
This paper proposes an Online Control-Informed Learning framework to solve a broad class of learning and control tasks in real time. By considering any robot as a tunable optimal control system, we propose an online parameter estimator based on extended Kalman filter (EKF) The proposed method also improves robustness in learning by effectively managing noise in the data.
arXiv Detail & Related papers (2024-10-04T21:03:16Z)
Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy. The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms. We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z)
Learning to Boost the Performance of Stable Nonlinear Systems [0.0]
We tackle the performance-boosting problem with closed-loop stability guarantees. Our methods enable learning over arbitrarily deep neural network classes of performance-boosting controllers for stable nonlinear systems.
arXiv Detail & Related papers (2024-05-01T21:11:29Z)
Active Learning for Control-Oriented Identification of Nonlinear Systems [26.231260751633307]
We present the first finite sample analysis of an active learning algorithm suitable for a general class of nonlinear dynamics. In certain settings, the excess control cost of our algorithm achieves the optimal rate, up to logarithmic factors. We validate our approach in simulation, showcasing the advantage of active, control-oriented exploration for controlling nonlinear systems.
arXiv Detail & Related papers (2024-04-13T15:40:39Z)
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment. We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z)
Optimal Exploration for Model-Based RL in Nonlinear Systems [14.540210895533937]
Learning to control unknown nonlinear dynamical systems is a fundamental problem in reinforcement learning and control theory. We develop an algorithm able to efficiently explore the system to reduce uncertainty in a task-dependent metric. Our algorithm relies on a general reduction from policy optimization to optimal experiment design in arbitrary systems, and may be of independent interest.
arXiv Detail & Related papers (2023-06-15T15:47:50Z)
On Robust Numerical Solver for ODE via Self-Attention Mechanism [82.95493796476767]
We explore training efficient and robust AI-enhanced numerical solvers with a small data size by mitigating intrinsic noise disturbances. We first analyze the ability of the self-attention mechanism to regulate noise in supervised learning and then propose a simple-yet-effective numerical solver, Attr, which introduces an additive self-attention mechanism to the numerical solution of differential equations.
arXiv Detail & Related papers (2023-02-05T01:39:21Z)
Efficient Learning of Voltage Control Strategies via Model-based Deep Reinforcement Learning [9.936452412191326]
This article proposes a model-based deep reinforcement learning (DRL) method to design emergency control strategies for short-term voltage stability problems in power systems. Recent advances show promising results in model-free DRL-based methods for power systems, but model-free methods suffer from poor sample efficiency and training time. We propose a novel model-based-DRL framework where a deep neural network (DNN)-based dynamic surrogate model is utilized with the policy learning framework.
arXiv Detail & Related papers (2022-12-06T02:50:53Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
Anticipating the Long-Term Effect of Online Learning in Control [75.6527644813815]
AntLer is a design algorithm for learning-based control laws that anticipates learning. We show that AntLer approximates an optimal solution arbitrarily accurately with probability one.
arXiv Detail & Related papers (2020-07-24T07:00:14Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.