Related papers: UAV Path Planning Employing MPC- Reinforcement Learning Method for search and rescue mission

UAV Path Planning Employing MPC- Reinforcement Learning Method for search and rescue mission

URL: http://arxiv.org/abs/2302.10669v1
Date: Tue, 21 Feb 2023 13:39:40 GMT
Title: UAV Path Planning Employing MPC- Reinforcement Learning Method for search and rescue mission
Authors: Mahya Ramezani, Hamed Habibi, Jose luis Sanchez Lopez, Holger Voos
Abstract summary: We tackle the problem of Unmanned Aerial (UA V) path planning in complex and uncertain environments. We design a Model Predictive Control (MPC) based on a Long-Short-Term Memory (LSTM) network integrated into the Deep Deterministic Policy Gradient algorithm.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we tackle the problem of Unmanned Aerial (UA V) path planning in complex and uncertain environments by designing a Model Predictive Control (MPC), based on a Long-Short-Term Memory (LSTM) network integrated into the Deep Deterministic Policy Gradient algorithm. In the proposed solution, LSTM-MPC operates as a deterministic policy within the DDPG network, and it leverages a predicting pool to store predicted future states and actions for improved robustness and efficiency. The use of the predicting pool also enables the initialization of the critic network, leading to improved convergence speed and reduced failure rate compared to traditional reinforcement learning and deep reinforcement learning methods. The effectiveness of the proposed solution is evaluated by numerical simulations.

Related papers

Learning Maximal Safe Sets Using Hypernetworks for MPC-based Local Trajectory Planning in Unknown Environments [1.3182466374784207]
This paper presents a novel learning-based approach for online estimation of maximal safe sets for local trajectory planning in unknown static environments. The neural representation of a set is used as the terminal set constraint for a model predictive control (MPC) local planner. We deploy our proposed method, NTC-MPC, on a physical robot and demonstrate its ability to safely avoid obstacles in scenarios where the baselines fail.
arXiv Detail & Related papers (2024-10-26T20:37:57Z)
SOMTP: Self-Supervised Learning-Based Optimizer for MPC-Based Safe Trajectory Planning Problems in Robotics [13.129654942805846]
Model Predictive Control (MP)-based trajectory planning has been widely used in, and Control Barrier (CBF) can improve its constraints. In this paper, we propose a self-supervised learning algorithm for CBF-MPC trajectory planning.
arXiv Detail & Related papers (2024-05-15T09:38:52Z)
Distributed Multi-Objective Dynamic Offloading Scheduling for Air-Ground Cooperative MEC [13.71241401034042]
This paper proposes a distributed trajectory planning and offloading scheduling scheme, integrated with MORL and the kernel method. Numerical results reveal that the n-step return can benefit the proposed kernel-based approach, achieving significant improvement in the long-term average backlog performance.
arXiv Detail & Related papers (2024-03-16T13:50:31Z)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs) This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z)
Active RIS-aided EH-NOMA Networks: A Deep Reinforcement Learning Approach [66.53364438507208]
An active reconfigurable intelligent surface (RIS)-aided multi-user downlink communication system is investigated. Non-orthogonal multiple access (NOMA) is employed to improve spectral efficiency, and the active RIS is powered by energy harvesting (EH) An advanced LSTM based algorithm is developed to predict users' dynamic communication state. A DDPG based algorithm is proposed to joint control the amplification matrix and phase shift matrix RIS.
arXiv Detail & Related papers (2023-04-11T13:16:28Z)
Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z)
Coverage and Capacity Optimization in STAR-RISs Assisted Networks: A Machine Learning Approach [102.00221938474344]
A novel model is proposed for the coverage and capacity optimization of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) assisted networks. A loss function-based update strategy is the core point, which is able to calculate weights for both loss functions of coverage and capacity by a min-norm solver at each update. The numerical results demonstrate that the investigated update strategy outperforms the fixed weight-based MO algorithms.
arXiv Detail & Related papers (2022-04-13T13:52:22Z)
On Finite-Sample Analysis of Offline Reinforcement Learning with Deep ReLU Networks [46.067702683141356]
We study the statistical theory of offline reinforcement learning with deep ReLU networks. We quantify how the distribution shift of the offline data, the dimension of the input space, and the regularity of the system control the OPE estimation error.
arXiv Detail & Related papers (2021-03-11T14:01:14Z)
Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization [5.072893872296332]
Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications. We propose a learning algorithm that decouples the action constraints from the policy parameter update. We show that the proposed algorithm significantly outperforms the benchmark methods on a variety of control tasks.
arXiv Detail & Related papers (2021-02-22T14:28:03Z)
Optimal Inspection and Maintenance Planning for Deteriorating Structural Components through Dynamic Bayesian Networks and Markov Decision Processes [0.0]
Partially Observable Markov Decision Processes (POMDPs) provide a mathematical methodology for optimal control under uncertain action outcomes and observations. We provide the formulation for developing both infinite and finite horizon POMDPs in a structural reliability context. Results show that POMDPs achieve substantially lower costs as compared to their counterparts, even for traditional problem settings.
arXiv Detail & Related papers (2020-09-09T20:03:42Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search. We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.