A Multi-Arm Bandit Approach To Subset Selection Under Constraints
- URL: http://arxiv.org/abs/2102.04824v1
- Date: Tue, 9 Feb 2021 13:48:57 GMT
- Title: A Multi-Arm Bandit Approach To Subset Selection Under Constraints
- Authors: Ayush Deva, Kumar Abhishek, Sujit Gujar
- Abstract summary: We explore the class of problems where a central planner needs to select a subset of agents, each with different quality and cost.
When the agents' quality is known, we formulate our problem as an integer linear program (ILP)
We propose a deterministic algorithm, namely dpss, that provides an exact solution to our ILP.
- Score: 14.543433698096667
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We explore the class of problems where a central planner needs to select a
subset of agents, each with different quality and cost. The planner wants to
maximize its utility while ensuring that the average quality of the selected
agents is above a certain threshold. When the agents' quality is known, we
formulate our problem as an integer linear program (ILP) and propose a
deterministic algorithm, namely \dpss\ that provides an exact solution to our
ILP.
We then consider the setting when the qualities of the agents are unknown. We
model this as a Multi-Arm Bandit (MAB) problem and propose \newalgo\ to learn
the qualities over multiple rounds. We show that after a certain number of
rounds, $\tau$, \newalgo\ outputs a subset of agents that satisfy the average
quality constraint with a high probability. Next, we provide bounds on $\tau$
and prove that after $\tau$ rounds, the algorithm incurs a regret of $O(\ln
T)$, where $T$ is the total number of rounds. We further illustrate the
efficacy of \newalgo\ through simulations.
To overcome the computational limitations of \dpss, we propose a
polynomial-time greedy algorithm, namely \greedy, that provides an approximate
solution to our ILP. We also compare the performance of \dpss\ and \greedy\
through experiments.
Related papers
- Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space [0.0]
We study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes.
We propose an algorithm based on Thompson sampling with dynamically-sized episodes.
We show that our algorithm can be applied to develop approximately optimal control algorithms.
arXiv Detail & Related papers (2023-06-05T03:57:16Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits [81.60136088841948]
We propose an algorithm that minimizes the regret over the horizon of time $T$.
The proposed algorithm is applicable to domains such as recommendation systems and transportation.
arXiv Detail & Related papers (2023-01-31T03:49:00Z) - Distributed Contextual Linear Bandits with Minimax Optimal Communication
Cost [48.288452411283444]
We study distributed contextual linear bandits with contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features.
We propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server.
We prove that over $T$ rounds ($NT$ actions in total) the communication cost of DisBE-LUCB is only $tildemathcalO(dN)$ and its regret is at most $tildemathcalO
arXiv Detail & Related papers (2022-05-26T05:56:23Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Fully Gap-Dependent Bounds for Multinomial Logit Bandit [5.132017939561661]
We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items.
We present (i) an algorithm that identifies the optimal assortment $S*$ within $widetildeO(sum_i = 1N Delta_i-2)$ time steps with high probability, and (ii) an algorithm that incurs $O(sum_i notin S* KDelta_i
arXiv Detail & Related papers (2020-11-19T17:52:12Z) - Thompson sampling for linear quadratic mean-field teams [3.957353452014781]
We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents.
We propose a new Thompson sampling based learning algorithm which exploits the structure of the system model and show that the expected Bayesian regret of our proposed algorithm for a system with agents of different types at time horizon $T$ is $tildemathcalO big( |M|1.5 sqrtT big)$ irrespective of the total number of agents.
arXiv Detail & Related papers (2020-11-09T19:07:32Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms.
We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.