On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration
- URL: http://arxiv.org/abs/2409.11058v1
- Date: Tue, 17 Sep 2024 10:36:46 GMT
- Title: On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration
- Authors: Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub,
- Abstract summary: Unmanned aerial vehicles (UAVs) have become increasingly popular in various fields, including precision agriculture, search and rescue, and remote sensing.
This study aims to address this challenge by utilizing on-policy Reinforcement Learning (RL) with Proximal Policy Optimization (PPO) to explore the two dimensional area of interest with multiple UAVs.
The proposed solution includes actor-critic networks using deep convolutional neural networks (CNN) and long short-term memory (LSTM) for identifying the UAVs and areas that have already been covered.
- Score: 0.7373617024876724
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Unmanned aerial vehicles (UAVs) have become increasingly popular in various fields, including precision agriculture, search and rescue, and remote sensing. However, exploring unknown environments remains a significant challenge. This study aims to address this challenge by utilizing on-policy Reinforcement Learning (RL) with Proximal Policy Optimization (PPO) to explore the {two dimensional} area of interest with multiple UAVs. The UAVs will avoid collision with obstacles and each other and do the exploration in a distributed manner. The proposed solution includes actor-critic networks using deep convolutional neural networks {(CNN)} and long short-term memory (LSTM) for identifying the UAVs and areas that have already been covered. Compared to other RL techniques, such as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the simulation results demonstrate the superiority of the proposed PPO approach. Also, the results show that combining LSTM with CNN in critic can improve exploration. Since the proposed exploration has to work in unknown environments, the results showed that the proposed setup can complete the coverage when we have new maps that differ from the trained maps. Finally, we showed how tuning hyper parameters may affect the overall performance.
Related papers
- Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - Aerial View Goal Localization with Reinforcement Learning [6.165163123577484]
We present a framework that emulates a search-and-rescue (SAR)-like setup without requiring access to actual UAVs.
In this framework, an agent operates on top of an aerial image (proxy for a search area) and is tasked with localizing a goal that is described in terms of visual cues.
We propose AiRLoc, a reinforcement learning (RL)-based model that decouples exploration (searching for distant goals) and exploitation (localizing nearby goals)
arXiv Detail & Related papers (2022-09-08T10:27:53Z) - Rethinking Drone-Based Search and Rescue with Aerial Person Detection [79.76669658740902]
The visual inspection of aerial drone footage is an integral part of land search and rescue (SAR) operations today.
We propose a novel deep learning algorithm to automate this aerial person detection (APD) task.
We present the novel Aerial Inspection RetinaNet (AIR) algorithm as the combination of these contributions.
arXiv Detail & Related papers (2021-11-17T21:48:31Z) - A Multi-UAV System for Exploration and Target Finding in Cluttered and
GPS-Denied Environments [68.31522961125589]
We propose a framework for a team of UAVs to cooperatively explore and find a target in complex GPS-denied environments with obstacles.
The team of UAVs autonomously navigates, explores, detects, and finds the target in a cluttered environment with a known map.
Results indicate that the proposed multi-UAV system has improvements in terms of time-cost, the proportion of search area surveyed, as well as successful rates for search and rescue missions.
arXiv Detail & Related papers (2021-07-19T12:54:04Z) - Deep Reinforcement Learning for Adaptive Exploration of Unknown
Environments [6.90777229452271]
We develop an adaptive exploration approach to trade off between exploration and exploitation in one single step for UAVs.
The proposed approach uses a map segmentation technique to decompose the environment map into smaller, tractable maps.
The results demonstrate that our proposed approach is capable of navigating through randomly generated environments and covering more AoI in less time steps compared to the baselines.
arXiv Detail & Related papers (2021-05-04T16:29:44Z) - Decentralized Reinforcement Learning for Multi-Target Search and
Detection by a Team of Drones [12.055303570215335]
Targets search and detection encompasses a variety of decision problems such as coverage, surveillance, search, observing and pursuit-evasion.
We develop a multi-agent deep reinforcement learning (MADRL) method to coordinate a group of aerial vehicles (drones) for the purpose of locating a set of static targets in an unknown area.
arXiv Detail & Related papers (2021-03-17T09:04:47Z) - A Vision Based Deep Reinforcement Learning Algorithm for UAV Obstacle
Avoidance [1.2693545159861856]
We present two techniques for improving exploration for UAV obstacle avoidance.
The first is a convergence-based approach that uses convergence error to iterate through unexplored actions and temporal threshold to balance exploration and exploitation.
The second is a guidance-based approach which uses a Gaussian mixture distribution to compare previously seen states to a predicted next state in order to select the next action.
arXiv Detail & Related papers (2021-03-11T01:15:26Z) - MRDet: A Multi-Head Network for Accurate Oriented Object Detection in
Aerial Images [51.227489316673484]
We propose an arbitrary-oriented region proposal network (AO-RPN) to generate oriented proposals transformed from horizontal anchors.
To obtain accurate bounding boxes, we decouple the detection task into multiple subtasks and propose a multi-head network.
Each head is specially designed to learn the features optimal for the corresponding task, which allows our network to detect objects accurately.
arXiv Detail & Related papers (2020-12-24T06:36:48Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - Reinforcement Learning for UAV Autonomous Navigation, Mapping and Target
Detection [36.79380276028116]
We study a joint detection, mapping and navigation problem for a single unmanned aerial vehicle (UAV) equipped with a low complexity radar and flying in an unknown environment.
The goal is to optimize its trajectory with the purpose of maximizing the mapping accuracy and to avoid areas where measurements might not be sufficiently informative from the perspective of a target detection.
arXiv Detail & Related papers (2020-05-05T20:39:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.