Sum-max Submodular Bandits
- URL: http://arxiv.org/abs/2311.05975v1
- Date: Fri, 10 Nov 2023 10:18:50 GMT
- Title: Sum-max Submodular Bandits
- Authors: Stephen Pasteris, Alberto Rumi, Fabio Vitale, Nicol\`o Cesa-Bianchi
- Abstract summary: We show that all functions in this class satisfy a key property that we call pseudo-concavity.
This bound, attained by a simple and efficient algorithm, significantly improves on the $widetildeObig(T2/3big)$ regret bound for online monotone submodular with bandit feedback.
- Score: 7.337919355153117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many online decision-making problems correspond to maximizing a sequence of
submodular functions. In this work, we introduce sum-max functions, a subclass
of monotone submodular functions capturing several interesting problems,
including best-of-$K$-bandits, combinatorial bandits, and the bandit versions
on facility location, $M$-medians, and hitting sets. We show that all functions
in this class satisfy a key property that we call pseudo-concavity. This allows
us to prove $\big(1 - \frac{1}{e}\big)$-regret bounds for bandit feedback in
the nonstochastic setting of the order of $\sqrt{MKT}$ (ignoring log factors),
where $T$ is the time horizon and $M$ is a cardinality constraint. This bound,
attained by a simple and efficient algorithm, significantly improves on the
$\widetilde{O}\big(T^{2/3}\big)$ regret bound for online monotone submodular
maximization with bandit feedback.
Related papers
- Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round.
We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z) - Minimax Optimal Submodular Optimization with Bandit Feedback [13.805872311596739]
We consider maximizing a monotonic, submodular set function $f: 2[n] rightarrow [0,1]$ under bandit feedback.
Specifically, $f$ is unknown to the learner but at each time $t=1,dots,T$ the learner chooses a set $S_t subset [n]$ with $|S_t| leq k$ and receives reward $f(S_t) + eta_t$ where $eta_t$ is mean-zero sub-Gaussian noise.
arXiv Detail & Related papers (2023-10-27T20:19:03Z) - Bandit Multi-linear DR-Submodular Maximization and Its Applications on
Adversarial Submodular Bandits [21.54858035450694]
We give a sublinear regret algorithm for the submodular bandit with partition matroid constraint.
For the bandit sequential submodular, the existing work proves an $O(T2/3)$ regret with a suboptimal $1/2$ approximation ratio.
arXiv Detail & Related papers (2023-05-21T08:51:55Z) - Randomized Greedy Learning for Non-monotone Stochastic Submodular
Maximization Under Full-bandit Feedback [98.29086113546045]
We investigate the problem of unconstrained multi-armed bandits with full-bandit feedback and rewards for submodularity.
We show that RGL empirically outperforms other full-bandit variants in submodular and non-submodular settings.
arXiv Detail & Related papers (2023-02-02T18:52:14Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Improved Regret Bounds for Online Submodular Maximization [10.089520556398575]
We consider an online optimization problem where at each step $tin[T]$, the algorithm chooses an action $x_t$ from the fixed convex and compact domain set $mathcalK$.
A utility function $f_t(cdot)$ is then revealed and the algorithm receives the payoff $f_t(x_t)$.
arXiv Detail & Related papers (2021-06-15T02:05:35Z) - Submodular + Concave [53.208470310734825]
It has been well established that first order optimization methods can converge to the maximal objective value of concave functions.
In this work, we initiate the determinant of the smooth functions convex body $$F(x) = G(x) +C(x)$.
This class of functions is an extension of both concave and continuous DR-submodular functions for which no guarantee is known.
arXiv Detail & Related papers (2021-06-09T01:59:55Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.