A Plug-and-Play Fully On-the-Job Real-Time Reinforcement Learning Algorithm for a Direct-Drive Tandem-Wing Experiment Platforms Under Multiple Random Operating Conditions
- URL: http://arxiv.org/abs/2410.15554v2
- Date: Fri, 20 Dec 2024 09:39:06 GMT
- Title: A Plug-and-Play Fully On-the-Job Real-Time Reinforcement Learning Algorithm for a Direct-Drive Tandem-Wing Experiment Platforms Under Multiple Random Operating Conditions
- Authors: Zhang Minghao, Song Bifeng, Yang Xiaojun, Wang Liang,
- Abstract summary: The Concerto Reinforcement Learning Extension (CRL2E) algorithm has been developed.<n>This plug-and-play, real-time reinforcement learning algorithm incorporates a novel Physics-Inspired Rule-Based Policy Composer Strategy.<n> Hardware tests indicate that the optimized lightweight network structure excels in weight loading and average inference time, meeting real-time control requirements.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The nonlinear and unstable aerodynamic interference generated by the tandem wings of such biomimetic systems poses substantial challenges for motion control, especially under multiple random operating conditions. To address these challenges, the Concerto Reinforcement Learning Extension (CRL2E) algorithm has been developed. This plug-and-play, fully on-the-job, real-time reinforcement learning algorithm incorporates a novel Physics-Inspired Rule-Based Policy Composer Strategy with a Perturbation Module alongside a lightweight network optimized for real-time control. To validate the performance and the rationality of the module design, experiments were conducted under six challenging operating conditions, comparing seven different algorithms. The results demonstrate that the CRL2E algorithm achieves safe and stable training within the first 500 steps, improving tracking accuracy by 14 to 66 times compared to the Soft Actor-Critic, Proximal Policy Optimization, and Twin Delayed Deep Deterministic Policy Gradient algorithms. Additionally, CRL2E significantly enhances performance under various random operating conditions, with improvements in tracking accuracy ranging from 8.3% to 60.4% compared to the Concerto Reinforcement Learning (CRL) algorithm. The convergence speed of CRL2E is 36.11% to 57.64% faster than the CRL algorithm with only the Composer Perturbation and 43.52% to 65.85% faster than the CRL algorithm when both the Composer Perturbation and Time-Interleaved Capability Perturbation are introduced, especially in conditions where the standard CRL struggles to converge. Hardware tests indicate that the optimized lightweight network structure excels in weight loading and average inference time, meeting real-time control requirements.
Related papers
- Real Time Control of Tandem-Wing Experimental Platform Using Concerto Reinforcement Learning [0.0]
This paper introduces the CRL2RT algorithm, an advanced reinforcement learning method aimed at improving the real-time control performance of the Direct-Drive Tandem-Wing Experimental Platform (DDTWEP)
Results show that CRL2RT achieves a control frequency surpassing 2500 Hz on standard CPUs.
arXiv Detail & Related papers (2025-02-08T03:46:40Z) - Safe Load Balancing in Software-Defined-Networking [1.2521494095948067]
Control Barrier (CBF) designed on top of Deep Reinforcement Learning (DRL) algorithms for load-balancing.
We show that our DRL-CBF approach is capable of meeting safety requirements during training and testing.
arXiv Detail & Related papers (2024-10-22T09:34:22Z) - Multiobjective Vehicle Routing Optimization with Time Windows: A Hybrid Approach Using Deep Reinforcement Learning and NSGA-II [52.083337333478674]
This paper proposes a weight-aware deep reinforcement learning (WADRL) approach designed to address the multiobjective vehicle routing problem with time windows (MOVRPTW)
The Non-dominated sorting genetic algorithm-II (NSGA-II) method is then employed to optimize the outcomes produced by the WADRL.
arXiv Detail & Related papers (2024-07-18T02:46:06Z) - Deep reinforcement learning applied to an assembly sequence planning
problem with user preferences [1.0558951653323283]
We propose an approach to the implementation of DRL methods in assembly sequence planning problems.
The proposed approach introduces in the RL environment parametric actions to improve training time and sample efficiency.
The results support the potential for the application of deep reinforcement learning in assembly sequence planning problems with human interaction.
arXiv Detail & Related papers (2023-04-13T14:25:15Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Off-Policy Deep Reinforcement Learning Algorithms for Handling Various
Robotic Manipulator Tasks [0.0]
In this study, three reinforcement learning algorithms; DDPG, TD3 and SAC have been used to train Fetch robotic manipulator for four different tasks.
All of these algorithms are off-policy and able to achieve their desired target by optimizing both policy and value functions.
arXiv Detail & Related papers (2022-12-11T18:25:24Z) - Robust optimal well control using an adaptive multi-grid reinforcement
learning framework [0.0]
Reinforcement learning is a promising tool to solve robust optimal well control problems.
The proposed framework is demonstrated using a state-of-the-art, model-free policy-based RL algorithm.
Prominent gains in the computational efficiency is observed using the proposed framework saving around 60-70% of computational cost of its single fine-grid counterpart.
arXiv Detail & Related papers (2022-07-07T12:08:57Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - Recursive Least Squares Advantage Actor-Critic Algorithms [20.792917267835247]
We propose two novel RLS-based advantage actor critic (A2C) algorithms.
RLSSA2C and RLSNA2C, use the RLS method to train the critic network and the hidden layers of the actor network.
From the experimental results, it is shown that our both algorithms have better sample efficiency than the vanilla A2C on most games or tasks.
arXiv Detail & Related papers (2022-01-15T20:00:26Z) - Distributed Multi-agent Meta Learning for Trajectory Design in Wireless
Drone Networks [151.27147513363502]
This paper studies the problem of the trajectory design for a group of energyconstrained drones operating in dynamic wireless network environments.
A value based reinforcement learning (VDRL) solution and a metatraining mechanism is proposed.
arXiv Detail & Related papers (2020-12-06T01:30:12Z) - Meta-Reinforcement Learning for Trajectory Design in Wireless UAV
Networks [151.65541208130995]
A drone base station (DBS) is dispatched to provide uplink connectivity to ground users whose demand is dynamic and unpredictable.
In this case, the DBS's trajectory must be adaptively adjusted to satisfy the dynamic user access requests.
A meta-learning algorithm is proposed in order to adapt the DBS's trajectory when it encounters novel environments.
arXiv Detail & Related papers (2020-05-25T20:43:59Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.