Real-Time Model-Free Deep Reinforcement Learning for Force Control of a
Series Elastic Actuator
- URL: http://arxiv.org/abs/2304.04911v1
- Date: Tue, 11 Apr 2023 00:51:47 GMT
- Title: Real-Time Model-Free Deep Reinforcement Learning for Force Control of a
Series Elastic Actuator
- Authors: Ruturaj Sambhus, Aydin Gokce, Stephen Welch, Connor W. Herron, and
Alexander Leonessa
- Abstract summary: State-of-the art robotic applications utilize series elastic actuators (SEAs) with closed-loop force control to achieve complex tasks such as walking, lifting, and manipulation.
Model-free PID control methods are more prone to instability due to nonlinearities in the SEA.
Deep reinforcement learning has proved to be an effective model-free method for continuous control tasks.
- Score: 56.11574814802912
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many state-of-the art robotic applications utilize series elastic actuators
(SEAs) with closed-loop force control to achieve complex tasks such as walking,
lifting, and manipulation. Model-free PID control methods are more prone to
instability due to nonlinearities in the SEA where cascaded model-based robust
controllers can remove these effects to achieve stable force control. However,
these model-based methods require detailed investigations to characterize the
system accurately. Deep reinforcement learning (DRL) has proved to be an
effective model-free method for continuous control tasks, where few works deal
with hardware learning. This paper describes the training process of a DRL
policy on hardware of an SEA pendulum system for tracking force control
trajectories from 0.05 - 0.35 Hz at 50 N amplitude using the Proximal Policy
Optimization (PPO) algorithm. Safety mechanisms are developed and utilized for
training the policy for 12 hours (overnight) without an operator present within
the full 21 hours training period. The tracking performance is evaluated
showing improvements of $25$ N in mean absolute error when comparing the first
18 min. of training to the full 21 hours for a 50 N amplitude, 0.1 Hz sinusoid
desired force trajectory. Finally, the DRL policy exhibits better tracking and
stability margins when compared to a model-free PID controller for a 50 N chirp
force trajectory.
Related papers
- DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning [61.10299147201369]
This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents.
We build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator.
We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild dataset, where our 1.3B VLM trained with RL achieves a 49.5% absolute improvement.
arXiv Detail & Related papers (2024-06-14T17:49:55Z) - Integrating DeepRL with Robust Low-Level Control in Robotic Manipulators for Non-Repetitive Reaching Tasks [0.24578723416255746]
In robotics, contemporary strategies are learning-based, characterized by a complex black-box nature and a lack of interpretability.
We propose integrating a collision-free trajectory planner based on deep reinforcement learning (DRL) with a novel auto-tuning low-level control strategy.
arXiv Detail & Related papers (2024-02-04T15:54:03Z) - DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control [62.24301794794304]
Deep Adaptive Trajectory Tracking (DATT) is a learning-based approach that can precisely track arbitrary, potentially infeasible trajectories in the presence of large disturbances in the real world.
DATT significantly outperforms competitive adaptive nonlinear and model predictive controllers for both feasible smooth and infeasible trajectories in unsteady wind fields.
It can efficiently run online with an inference time less than 3.2 ms, less than 1/4 of the adaptive nonlinear model predictive control baseline.
arXiv Detail & Related papers (2023-10-13T12:22:31Z) - Self-Tuning PID Control via a Hybrid Actor-Critic-Based Neural Structure
for Quadcopter Control [0.0]
Proportional-Integrator-Derivative (PID) controller is used in a wide range of industrial and experimental processes.
Due to the uncertainty of model parameters and external disturbances, real systems such as Quadrotors need more robust and reliable PID controllers.
In this research, a self-tuning PID controller using a Reinforcement-Learning-based Neural Network has been investigated.
arXiv Detail & Related papers (2023-07-03T19:35:52Z) - Turbulence control in plane Couette flow using low-dimensional neural
ODE-based models and deep reinforcement learning [0.0]
"DManD-RL" (data-driven manifold dynamics-RL) generates a data-driven low-dimensional model of our system.
We train an RL control agent, yielding a 440-fold speedup over training on a numerical simulation.
The agent learns a policy that laminarizes 84% of unseen DNS test trajectories within 900 time units.
arXiv Detail & Related papers (2023-01-28T05:47:10Z) - Improving the Performance of Robust Control through Event-Triggered
Learning [74.57758188038375]
We propose an event-triggered learning algorithm that decides when to learn in the face of uncertainty in the LQR problem.
We demonstrate improved performance over a robust controller baseline in a numerical example.
arXiv Detail & Related papers (2022-07-28T17:36:37Z) - Data-Efficient Deep Reinforcement Learning for Attitude Control of
Fixed-Wing UAVs: Field Experiments [0.37798600249187286]
We show that DRL can successfully learn to perform attitude control of a fixed-wing UAV operating directly on the original nonlinear dynamics.
We deploy the learned controller on the UAV in flight tests, demonstrating comparable performance to the state-of-the-art ArduPlane proportional-integral-derivative (PID) attitude controller.
arXiv Detail & Related papers (2021-11-07T19:07:46Z) - Online Reinforcement Learning Control by Direct Heuristic Dynamic
Programming: from Time-Driven to Event-Driven [80.94390916562179]
Time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives.
It is desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise.
We show how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.
arXiv Detail & Related papers (2020-06-16T05:51:25Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.