Related papers: Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

URL: http://arxiv.org/abs/2505.18201v1
Date: Wed, 21 May 2025 12:27:09 GMT
Title: Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones
Authors: Romain Poletti, Lorenzo Schena, Lilla Koloszar, Joris Degroote, Miguel Alfonso Mendez,
Abstract summary: This article presents a novel hybrid model-free/model-based approach to flight control based on the proposed reinforcement twinning algorithm.<n>The algorithm is evaluated for controlling the longitudinal dynamics of a flapping-wing drone.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Controlling the flight of flapping-wing drones requires versatile controllers that handle their time-varying, nonlinear, and underactuated dynamics from incomplete and noisy sensor data. Model-based methods struggle with accurate modeling, while model-free approaches falter in efficiently navigating very high-dimensional and nonlinear control objective landscapes. This article presents a novel hybrid model-free/model-based approach to flight control based on the recently proposed reinforcement twinning algorithm. The model-based (MB) approach relies on an adjoint formulation using an adaptive digital twin, continuously identified from live trajectories, while the model-free (MF) approach relies on reinforcement learning. The two agents collaborate through transfer learning, imitation learning, and experience sharing using the real environment, the digital twin and a referee. The latter selects the best agent to interact with the real environment based on performance within the digital twin and a real-to-virtual environment consistency ratio. The algorithm is evaluated for controlling the longitudinal dynamics of a flapping-wing drone, with the environment simulated as a nonlinear, time-varying dynamical system under the influence of quasi-steady aerodynamic forces. The hybrid control learning approach is tested with three types of initialization of the adaptive model: (1) offline identification using previously available data, (2) random initialization with full online identification, and (3) offline pre-training with an estimation bias, followed by online adaptation. In all three scenarios, the proposed hybrid learning approach demonstrates superior performance compared to purely model-free and model-based methods.

Related papers

Action Flow Matching for Continual Robot Learning [57.698553219660376]
Continual learning in robotics seeks systems that can constantly adapt to changing environments and tasks.<n>We introduce a generative framework leveraging flow matching for online robot dynamics model alignment.<n>We find that by transforming the actions themselves rather than exploring with a misaligned model, the robot collects informative data more efficiently.
arXiv Detail & Related papers (2025-04-25T16:26:15Z)
Differentiable Information Enhanced Model-Based Reinforcement Learning [48.820039382764]
Differentiable environments have heralded new possibilities for learning control policies by offering rich differentiable information.<n>Model-based reinforcement learning (MBRL) methods exhibit the potential to effectively harness the power of differentiable information for recovering the underlying physical dynamics.<n>However, this presents two primary challenges: effectively utilizing differentiable information to 1) construct models with more accurate dynamic prediction and 2) enhance the stability of policy training.
arXiv Detail & Related papers (2025-03-03T04:51:40Z)
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations. Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z)
Physics-informed reinforcement learning via probabilistic co-adjustment functions [3.6787556334630334]
We introduce co-kriging adjustments (CKA) and ridge regression adjustment (RRA) as novel ways to combine the advantages of both approaches. Our adjustment methods are based on an auto-regressive AR1 co-kriging model that we integrate with GP priors.
arXiv Detail & Related papers (2023-09-11T12:10:19Z)
End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear Model Predictive Control [45.84205238554709]
We present a method for reinforcement learning of Koopman surrogate models for optimal performance as part of (e)NMPC. We show that the end-to-end trained models outperform those trained using system identification in (e)NMPC.
arXiv Detail & Related papers (2023-08-03T10:21:53Z)
Active Learning of Discrete-Time Dynamics for Uncertainty-Aware Model Predictive Control [46.81433026280051]
We present a self-supervised learning approach that actively models the dynamics of nonlinear robotic systems. Our approach showcases high resilience and generalization capabilities by consistently adapting to unseen flight conditions.
arXiv Detail & Related papers (2022-10-23T00:45:05Z)
Physics-Inspired Temporal Learning of Quadrotor Dynamics for Accurate Model Predictive Trajectory Tracking [76.27433308688592]
Accurately modeling quadrotor's system dynamics is critical for guaranteeing agile, safe, and stable navigation. We present a novel Physics-Inspired Temporal Convolutional Network (PI-TCN) approach to learning quadrotor's system dynamics purely from robot experience. Our approach combines the expressive power of sparse temporal convolutions and dense feed-forward connections to make accurate system predictions.
arXiv Detail & Related papers (2022-06-07T13:51:35Z)
Learning Adaptive Control for SE(3) Hamiltonian Dynamics [15.26733033527393]
This paper develops adaptive geometric control for rigid-body systems, such as ground, aerial, and underwater vehicles. We learn a Hamiltonian model of the system dynamics using a neural ordinary differential equation network trained from state-control trajectory data. In the second stage, we design a trajectory tracking controller with disturbance compensation from an energy-based perspective.
arXiv Detail & Related papers (2021-09-21T05:54:28Z)
Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition [55.362258027878966]
We present momentum pseudo-labeling (MPL) as a simple yet effective strategy for semi-supervised speech recognition. MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method. The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios.
arXiv Detail & Related papers (2021-06-16T16:24:55Z)
Preference-Based Learning for User-Guided HZD Gait Generation on Bipedal Walking Robots [31.994815173888806]
This paper presents a framework that leverages both control theory and machine learning to obtain stable and robust bipedal locomotion. Results show that the framework achieves stable, robust, efficient, and natural walking in fewer than 50 iterations with no reliance on a simulation environment.
arXiv Detail & Related papers (2020-11-10T22:15:56Z)
Model-Free Voltage Regulation of Unbalanced Distribution Network Based on Surrogate Model and Deep Reinforcement Learning [9.984416150031217]
This paper develops a model-free approach based on the surrogate model and deep reinforcement learning (DRL) We have also extended it to deal with unbalanced three-phase scenarios.
arXiv Detail & Related papers (2020-06-24T18:49:41Z)
Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems [91.43582419264763]
We study the problem of system identification and adaptive control in partially observable linear dynamical systems. We present the first model estimation method with finite-time guarantees in both open and closed-loop system identification. We show that AdaptOn is the first algorithm that achieves $textpolylogleft(Tright)$ regret in adaptive control of unknown partially observable linear dynamical systems.
arXiv Detail & Related papers (2020-03-25T06:00:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.