Dual-Mandate Patrols: Multi-Armed Bandits for Green Security
- URL: http://arxiv.org/abs/2009.06560v3
- Date: Fri, 26 Apr 2024 13:51:17 GMT
- Title: Dual-Mandate Patrols: Multi-Armed Bandits for Green Security
- Authors: Lily Xu, Elizabeth Bondi, Fei Fang, Andrew Perrault, Kai Wang, Milind Tambe,
- Abstract summary: Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders.
We formulate the problem as a multi-armed bandit, where each action represents a patrol strategy.
We show that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.
- Score: 67.29846393678808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i.e., patrollers), who must patrol vast areas to protect from attackers (e.g., poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation of known hotspots. We formulate the problem as a stochastic multi-armed bandit, where each action represents a patrol strategy, enabling us to guarantee the rate of convergence of the patrolling policy. However, a naive bandit approach would compromise short-term performance for long-term optimality, resulting in animals poached and forests destroyed. To speed up performance, we leverage smoothness in the reward function and decomposability of actions. We show a synergy between Lipschitz-continuity and decomposition as each aids the convergence of the other. In doing so, we bridge the gap between combinatorial and Lipschitz bandits, presenting a no-regret approach that tightens existing guarantees while optimizing for short-term performance. We demonstrate that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.
Related papers
- Patrol Security Game: Defending Against Adversary with Freedom in Attack Timing, Location, and Duration [4.765278970103286]
Patrol Security Game (PSG) is a robotic patrolling problem modeled as an extensive-form deterministic Stackelberg problem.
Our objective is to devise a synthetic schedule that minimizes the attacker's time horizon.
arXiv Detail & Related papers (2024-10-21T02:53:18Z) - Multi-Agent Reinforcement Learning for Joint Police Patrol and Dispatch [13.336551874123796]
We propose a novel method for jointly optimizing multi-agent patrol and dispatch to learn policies yielding rapid response times.
Our method treats each patroller as an independent Q-learner (agent) with a shared deep Q-network that represents the state-action values.
We demonstrate that this heterogeneous multi-agent reinforcement learning approach is capable of learning policies that optimize for patrol or dispatch alone.
arXiv Detail & Related papers (2024-09-03T19:19:57Z) - Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications [69.13807233595455]
Large language models (LLMs) show inherent brittleness in their safety mechanisms.
This study explores this brittleness of safety alignment by leveraging pruning and low-rank modifications.
We show that LLMs remain vulnerable to low-cost fine-tuning attacks even when modifications to the safety-critical regions are restricted.
arXiv Detail & Related papers (2024-02-07T18:34:38Z) - Autonomous Vehicle Patrolling Through Deep Reinforcement Learning:
Learning to Communicate and Cooperate [3.79830302036482]
Finding an optimal patrolling strategy can be challenging due to unknown environmental factors, such as wind or landscape.
Agents are trained to develop their own communication protocol to cooperate during patrolling where faults can and do occur.
The solution is validated through simulation experiments and is compared with several state-of-the-art patrolling solutions from different perspectives.
arXiv Detail & Related papers (2024-01-28T14:29:30Z) - Robust Lipschitz Bandits to Adversarial Corruptions [61.85150061213987]
Lipschitz bandit is a variant of bandits that deals with a continuous arm set defined on a metric space.
In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions.
Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary.
arXiv Detail & Related papers (2023-05-29T18:16:59Z) - Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding.
We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z) - Ranked Prioritization of Groups in Combinatorial Bandit Allocation [62.24280332575472]
We propose a novel bandit objective that trades off between reward over species.
We show this objective can be expressed as a weighted linear sum of Lipschitz-continuous reward functions.
arXiv Detail & Related papers (2022-05-11T17:40:29Z) - Robust Reinforcement Learning Under Minimax Regret for Green Security [50.03819244940543]
Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers.
We focus on robust sequential patrol planning for green security following the minimax regret criterion, which has not been considered in the literature.
We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy.
arXiv Detail & Related papers (2021-06-15T20:11:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.