Bandit Multi-linear DR-Submodular Maximization and Its Applications on
Adversarial Submodular Bandits
- URL: http://arxiv.org/abs/2305.12402v1
- Date: Sun, 21 May 2023 08:51:55 GMT
- Title: Bandit Multi-linear DR-Submodular Maximization and Its Applications on
Adversarial Submodular Bandits
- Authors: Zongqi Wan, Jialin Zhang, Wei Chen, Xiaoming Sun, Zhijie Zhang
- Abstract summary: We give a sublinear regret algorithm for the submodular bandit with partition matroid constraint.
For the bandit sequential submodular, the existing work proves an $O(T2/3)$ regret with a suboptimal $1/2$ approximation ratio.
- Score: 21.54858035450694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the online bandit learning of the monotone multi-linear
DR-submodular functions, designing the algorithm $\mathtt{BanditMLSM}$ that
attains $O(T^{2/3}\log T)$ of $(1-1/e)$-regret. Then we reduce submodular
bandit with partition matroid constraint and bandit sequential monotone
maximization to the online bandit learning of the monotone multi-linear
DR-submodular functions, attaining $O(T^{2/3}\log T)$ of $(1-1/e)$-regret in
both problems, which improve the existing results. To the best of our
knowledge, we are the first to give a sublinear regret algorithm for the
submodular bandit with partition matroid constraint. A special case of this
problem is studied by Streeter et al.(2009). They prove a $O(T^{4/5})$
$(1-1/e)$-regret upper bound. For the bandit sequential submodular
maximization, the existing work proves an $O(T^{2/3})$ regret with a suboptimal
$1/2$ approximation ratio (Niazadeh et al. 2021).
Related papers
- Sum-max Submodular Bandits [7.337919355153117]
We show that all functions in this class satisfy a key property that we call pseudo-concavity.
This bound, attained by a simple and efficient algorithm, significantly improves on the $widetildeObig(T2/3big)$ regret bound for online monotone submodular with bandit feedback.
arXiv Detail & Related papers (2023-11-10T10:18:50Z) - Randomized Greedy Learning for Non-monotone Stochastic Submodular
Maximization Under Full-bandit Feedback [98.29086113546045]
We investigate the problem of unconstrained multi-armed bandits with full-bandit feedback and rewards for submodularity.
We show that RGL empirically outperforms other full-bandit variants in submodular and non-submodular settings.
arXiv Detail & Related papers (2023-02-02T18:52:14Z) - Online Learning for Non-monotone Submodular Maximization: From Full
Information to Bandit Feedback [12.914842850902456]
This paper revisits the online non-monotone continuous DR-submodular problem over a down-closed convex set.
We present the Meta-MFW algorithm achieving a $1/e$-regret bound of $O(sqrtT)$.
Next, we extend Mono-MFW to the bandit setting and propose the Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T8/9)$.
arXiv Detail & Related papers (2022-08-16T09:32:37Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Faster First-Order Algorithms for Monotone Strongly DR-Submodular
Maximization [11.919431962501479]
Continuous-submodular functions satisfy the Diminishing Returns (DR) property, which implies that they are concave along non-negative directions.
We propose a new algorithm that matches the provably optimal $1-fracce$ approximation ratio after only $lceilfracLmurce$.
arXiv Detail & Related papers (2021-11-15T18:53:14Z) - Improved Regret Bounds for Online Submodular Maximization [10.089520556398575]
We consider an online optimization problem where at each step $tin[T]$, the algorithm chooses an action $x_t$ from the fixed convex and compact domain set $mathcalK$.
A utility function $f_t(cdot)$ is then revealed and the algorithm receives the payoff $f_t(x_t)$.
arXiv Detail & Related papers (2021-06-15T02:05:35Z) - The Power of Subsampling in Submodular Maximization [51.629656762796564]
We show that this approach leads to optimal/state-of-the-art results despite being much simpler than existing methods.
We empirically demonstrate the effectiveness of our algorithms on video summarization, location summarization, and movie recommendation tasks.
arXiv Detail & Related papers (2021-04-06T20:25:57Z) - Revisiting Modified Greedy Algorithm for Monotone Submodular
Maximization with a Knapsack Constraint [75.85952446237599]
We show that a modified greedy algorithm can achieve an approximation factor of $0.305$.
We derive a data-dependent upper bound on the optimum.
It can also be used to significantly improve the efficiency of such algorithms as branch and bound.
arXiv Detail & Related papers (2020-08-12T15:40:21Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.