Related papers: Ranked Prioritization of Groups in Combinatorial Bandit Allocation

Ranked Prioritization of Groups in Combinatorial Bandit Allocation

URL: http://arxiv.org/abs/2205.05659v1
Date: Wed, 11 May 2022 17:40:29 GMT
Title: Ranked Prioritization of Groups in Combinatorial Bandit Allocation
Authors: Lily Xu, Arpita Biswas, Fei Fang, Milind Tambe
Abstract summary: We propose a novel bandit objective that trades off between reward over species. We show this objective can be expressed as a weighted linear sum of Lipschitz-continuous reward functions.
Score: 62.24280332575472
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Preventing poaching through ranger patrols protects endangered wildlife, directly contributing to the UN Sustainable Development Goal 15 of life on land. Combinatorial bandits have been used to allocate limited patrol resources, but existing approaches overlook the fact that each location is home to multiple species in varying proportions, so a patrol benefits each species to differing degrees. When some species are more vulnerable, we ought to offer more protection to these animals; unfortunately, existing combinatorial bandit approaches do not offer a way to prioritize important species. To bridge this gap, (1) We propose a novel combinatorial bandit objective that trades off between reward maximization and also accounts for prioritization over species, which we call ranked prioritization. We show this objective can be expressed as a weighted linear sum of Lipschitz-continuous reward functions. (2) We provide RankedCUCB, an algorithm to select combinatorial actions that optimize our prioritization-based objective, and prove that it achieves asymptotic no-regret. (3) We demonstrate empirically that RankedCUCB leads to up to 38% improvement in outcomes for endangered species using real-world wildlife conservation data. Along with adapting to other challenges such as preventing illegal logging and overfishing, our no-regret algorithm addresses the general combinatorial bandit problem with a weighted linear objective.

Related papers

Reinforcement learning with combinatorial actions for coupled restless bandits [62.89013331120493]
We propose SEQUOIA, an RL algorithm that directly optimize for long-term reward over the feasible action space. We empirically validate SEQUOIA on four novel restless bandit problems with constraints: multiple interventions, path constraints, bipartite matching, and capacity constraints.
arXiv Detail & Related papers (2025-03-01T21:25:21Z)
Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals [28.94461817548213]
We prove upper and matching lower bounds on the possible trade-offs in the performance of learning in conditionally benign and arbitrary environments. We are the first to obtain instance-dependent bounds for causal bandits, by reducing the problem to the linear bandit setting.
arXiv Detail & Related papers (2024-07-01T04:12:15Z)
Robust Lipschitz Bandits to Adversarial Corruptions [61.85150061213987]
Lipschitz bandit is a variant of bandits that deals with a continuous arm set defined on a metric space. In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions. Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary.
arXiv Detail & Related papers (2023-05-29T18:16:59Z)
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments. We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits. Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z)
Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased. We propose a novel approach for dueling bandits based on information-directed sampling (IDS) Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
Expanding boundaries of Gap Safe screening [0.0]
A powerful strategy to boost the performance of algorithms is known as safe screening. We extend the existing Gap Safe screening framework by relaxing the global strong-concavity assumption on the dual cost function. The proposed general framework is exemplified by some notable particular cases: logistic function, beta = 1.5 and Kullback-Leibler divergences.
arXiv Detail & Related papers (2021-02-22T09:23:31Z)
Dual-Mandate Patrols: Multi-Armed Bandits for Green Security [67.29846393678808]
Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders. We formulate the problem as a multi-armed bandit, where each action represents a patrol strategy. We show that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.
arXiv Detail & Related papers (2020-09-14T16:40:44Z)
Exploration in two-stage recommender systems [79.50534282841618]
Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability. A key challenge of this setup is that optimal performance of each stage in isolation does not imply optimal global performance. We propose a method of synchronising the exploration strategies between the ranker and the nominators.
arXiv Detail & Related papers (2020-09-01T16:52:51Z)
Adaptive Discretization for Adversarial Lipschitz Bandits [85.39106976861702]
Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces. A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions. We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
arXiv Detail & Related papers (2020-06-22T16:06:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.