Related papers: On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration

On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration

URL: http://arxiv.org/abs/2409.11058v1
Date: Tue, 17 Sep 2024 10:36:46 GMT
Title: On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration
Authors: Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub,
Abstract summary: Unmanned aerial vehicles (UAVs) have become increasingly popular in various fields, including precision agriculture, search and rescue, and remote sensing. This study aims to address this challenge by utilizing on-policy Reinforcement Learning (RL) with Proximal Policy Optimization (PPO) to explore the two dimensional area of interest with multiple UAVs. The proposed solution includes actor-critic networks using deep convolutional neural networks (CNN) and long short-term memory (LSTM) for identifying the UAVs and areas that have already been covered.
Score: 0.7373617024876724
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Unmanned aerial vehicles (UAVs) have become increasingly popular in various fields, including precision agriculture, search and rescue, and remote sensing. However, exploring unknown environments remains a significant challenge. This study aims to address this challenge by utilizing on-policy Reinforcement Learning (RL) with Proximal Policy Optimization (PPO) to explore the {two dimensional} area of interest with multiple UAVs. The UAVs will avoid collision with obstacles and each other and do the exploration in a distributed manner. The proposed solution includes actor-critic networks using deep convolutional neural networks {(CNN)} and long short-term memory (LSTM) for identifying the UAVs and areas that have already been covered. Compared to other RL techniques, such as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the simulation results demonstrate the superiority of the proposed PPO approach. Also, the results show that combining LSTM with CNN in critic can improve exploration. Since the proposed exploration has to work in unknown environments, the results showed that the proposed setup can complete the coverage when we have new maps that differ from the trained maps. Finally, we showed how tuning hyper parameters may affect the overall performance.

Related papers

Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID [0.03464344220266879]
Multi-UAV tracking in thermal infrared video is challenging due to low contrast, environmental noise, and small target sizes. We present a tracking framework built on YOLOv12 and BoT-SORT, enhanced with tailored training and inference strategies. We provide implementation details, in-depth experimental analysis, and a discussion of potential improvements.
arXiv Detail & Related papers (2025-03-21T15:40:18Z)
Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment. We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z)
Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method. REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes. It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z)
Aerial View Goal Localization with Reinforcement Learning [6.165163123577484]
We present a framework that emulates a search-and-rescue (SAR)-like setup without requiring access to actual UAVs. In this framework, an agent operates on top of an aerial image (proxy for a search area) and is tasked with localizing a goal that is described in terms of visual cues. We propose AiRLoc, a reinforcement learning (RL)-based model that decouples exploration (searching for distant goals) and exploitation (localizing nearby goals)
arXiv Detail & Related papers (2022-09-08T10:27:53Z)
Rethinking Drone-Based Search and Rescue with Aerial Person Detection [79.76669658740902]
The visual inspection of aerial drone footage is an integral part of land search and rescue (SAR) operations today. We propose a novel deep learning algorithm to automate this aerial person detection (APD) task. We present the novel Aerial Inspection RetinaNet (AIR) algorithm as the combination of these contributions.
arXiv Detail & Related papers (2021-11-17T21:48:31Z)
A Multi-UAV System for Exploration and Target Finding in Cluttered and GPS-Denied Environments [68.31522961125589]
We propose a framework for a team of UAVs to cooperatively explore and find a target in complex GPS-denied environments with obstacles. The team of UAVs autonomously navigates, explores, detects, and finds the target in a cluttered environment with a known map. Results indicate that the proposed multi-UAV system has improvements in terms of time-cost, the proportion of search area surveyed, as well as successful rates for search and rescue missions.
arXiv Detail & Related papers (2021-07-19T12:54:04Z)
Deep Reinforcement Learning for Adaptive Exploration of Unknown Environments [6.90777229452271]
We develop an adaptive exploration approach to trade off between exploration and exploitation in one single step for UAVs. The proposed approach uses a map segmentation technique to decompose the environment map into smaller, tractable maps. The results demonstrate that our proposed approach is capable of navigating through randomly generated environments and covering more AoI in less time steps compared to the baselines.
arXiv Detail & Related papers (2021-05-04T16:29:44Z)
Decentralized Reinforcement Learning for Multi-Target Search and Detection by a Team of Drones [12.055303570215335]
Targets search and detection encompasses a variety of decision problems such as coverage, surveillance, search, observing and pursuit-evasion. We develop a multi-agent deep reinforcement learning (MADRL) method to coordinate a group of aerial vehicles (drones) for the purpose of locating a set of static targets in an unknown area.
arXiv Detail & Related papers (2021-03-17T09:04:47Z)
A Vision Based Deep Reinforcement Learning Algorithm for UAV Obstacle Avoidance [1.2693545159861856]
We present two techniques for improving exploration for UAV obstacle avoidance. The first is a convergence-based approach that uses convergence error to iterate through unexplored actions and temporal threshold to balance exploration and exploitation. The second is a guidance-based approach which uses a Gaussian mixture distribution to compare previously seen states to a predicted next state in order to select the next action.
arXiv Detail & Related papers (2021-03-11T01:15:26Z)
MRDet: A Multi-Head Network for Accurate Oriented Object Detection in Aerial Images [51.227489316673484]
We propose an arbitrary-oriented region proposal network (AO-RPN) to generate oriented proposals transformed from horizontal anchors. To obtain accurate bounding boxes, we decouple the detection task into multiple subtasks and propose a multi-head network. Each head is specially designed to learn the features optimal for the corresponding task, which allows our network to detect objects accurately.
arXiv Detail & Related papers (2020-12-24T06:36:48Z)
Robust Unsupervised Video Anomaly Detection by Multi-Path Frame Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design. Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z)
Reinforcement Learning for UAV Autonomous Navigation, Mapping and Target Detection [36.79380276028116]
We study a joint detection, mapping and navigation problem for a single unmanned aerial vehicle (UAV) equipped with a low complexity radar and flying in an unknown environment. The goal is to optimize its trajectory with the purpose of maximizing the mapping accuracy and to avoid areas where measurements might not be sufficiently informative from the perspective of a target detection.
arXiv Detail & Related papers (2020-05-05T20:39:18Z)
Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling [65.99956848461915]
Vision-and-Language Navigation (VLN) is a task where agents must decide how to move through a 3D environment to reach a goal. One of the problems of the VLN task is data scarcity since it is difficult to collect enough navigation paths with human-annotated instructions for interactive environments. We propose an adversarial-driven counterfactual reasoning model that can consider effective conditions instead of low-quality augmented data.
arXiv Detail & Related papers (2019-11-17T18:02:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.