Reinforcement Learning for Robust Missile Autopilot Design
- URL: http://arxiv.org/abs/2011.12956v2
- Date: Sat, 18 Sep 2021 11:07:58 GMT
- Title: Reinforcement Learning for Robust Missile Autopilot Design
- Authors: Bernardo Cortez
- Abstract summary: This work is pioneer in proposing Reinforcement Learning as a framework for flight control.
Under TRPO's methodology, the collected experience is augmented according to HER, stored in a replay buffer and sampled according to its significance.
Results show that it is possible both to achieve the optimal performance and to improve the agent's robustness to uncertainties.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Designing missiles' autopilot controllers has been a complex task, given the
extensive flight envelope and the nonlinear flight dynamics. A solution that
can excel both in nominal performance and in robustness to uncertainties is
still to be found. While Control Theory often debouches into parameters'
scheduling procedures, Reinforcement Learning has presented interesting results
in ever more complex tasks, going from videogames to robotic tasks with
continuous action domains. However, it still lacks clearer insights on how to
find adequate reward functions and exploration strategies. To the best of our
knowledge, this work is pioneer in proposing Reinforcement Learning as a
framework for flight control. In fact, it aims at training a model-free agent
that can control the longitudinal flight of a missile, achieving optimal
performance and robustness to uncertainties. To that end, under TRPO's
methodology, the collected experience is augmented according to HER, stored in
a replay buffer and sampled according to its significance. Not only does this
work enhance the concept of prioritized experience replay into BPER, but it
also reformulates HER, activating them both only when the training progress
converges to suboptimal policies, in what is proposed as the SER methodology.
Besides, the Reward Engineering process is carefully detailed. The results show
that it is possible both to achieve the optimal performance and to improve the
agent's robustness to uncertainties (with low damage on nominal performance) by
further training it in non-nominal environments, therefore validating the
proposed approach and encouraging future research in this field.
Related papers
- Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - Efficient Reinforcement Learning via Decoupling Exploration and Utilization [6.305976803910899]
Reinforcement Learning (RL) has achieved remarkable success across multiple fields and applications, including gaming, robotics, and autonomous vehicles.
In this work, our aim is to train agent with efficient learning by decoupling exploration and utilization, so that agent can escaping the conundrum of suboptimal Solutions.
The above idea is implemented in the proposed OPARL (Optimistic and Pessimistic Actor Reinforcement Learning) algorithm.
arXiv Detail & Related papers (2023-12-26T09:03:23Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Behavior Alignment via Reward Function Optimization [23.92721220310242]
We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards.
We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
arXiv Detail & Related papers (2023-10-29T13:45:07Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Reinforcement Learning for Low-Thrust Trajectory Design of
Interplanetary Missions [77.34726150561087]
This paper investigates the use of reinforcement learning for the robust design of interplanetary trajectories in presence of severe disturbances.
An open-source implementation of the state-of-the-art algorithm Proximal Policy Optimization is adopted.
The resulting Guidance and Control Network provides both a robust nominal trajectory and the associated closed-loop guidance law.
arXiv Detail & Related papers (2020-08-19T15:22:15Z) - Accelerating Reinforcement Learning for Reaching using Continuous
Curriculum Learning [6.703429330486276]
We focus on accelerating reinforcement learning (RL) training and improving the performance of multi-goal reaching tasks.
Specifically, we propose a precision-based continuous curriculum learning (PCCL) method in which the requirements are gradually adjusted during the training process.
This approach is tested using a Universal Robot 5e in both simulation and real-world multi-goal reach experiments.
arXiv Detail & Related papers (2020-02-07T10:08:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.