Minimax Optimal Submodular Optimization with Bandit Feedback
- URL: http://arxiv.org/abs/2310.18465v1
- Date: Fri, 27 Oct 2023 20:19:03 GMT
- Title: Minimax Optimal Submodular Optimization with Bandit Feedback
- Authors: Artin Tajdini, Lalit Jain, Kevin Jamieson
- Abstract summary: We consider maximizing a monotonic, submodular set function $f: 2[n] rightarrow [0,1]$ under bandit feedback.
Specifically, $f$ is unknown to the learner but at each time $t=1,dots,T$ the learner chooses a set $S_t subset [n]$ with $|S_t| leq k$ and receives reward $f(S_t) + eta_t$ where $eta_t$ is mean-zero sub-Gaussian noise.
- Score: 13.805872311596739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider maximizing a monotonic, submodular set function $f: 2^{[n]}
\rightarrow [0,1]$ under stochastic bandit feedback. Specifically, $f$ is
unknown to the learner but at each time $t=1,\dots,T$ the learner chooses a set
$S_t \subset [n]$ with $|S_t| \leq k$ and receives reward $f(S_t) + \eta_t$
where $\eta_t$ is mean-zero sub-Gaussian noise. The objective is to minimize
the learner's regret over $T$ times with respect to ($1-e^{-1}$)-approximation
of maximum $f(S_*)$ with $|S_*| = k$, obtained through greedy maximization of
$f$. To date, the best regret bound in the literature scales as $k n^{1/3}
T^{2/3}$. And by trivially treating every set as a unique arm one deduces that
$\sqrt{ {n \choose k} T }$ is also achievable. In this work, we establish the
first minimax lower bound for this setting that scales like
$\mathcal{O}(\min_{i \le k}(in^{1/3}T^{2/3} + \sqrt{n^{k-i}T}))$. Moreover, we
propose an algorithm that is capable of matching the lower bound regret.
Related papers
- Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Faster Stochastic Algorithms for Minimax Optimization under
Polyak--{\L}ojasiewicz Conditions [12.459354707528819]
We propose SPIDER-GDA for solving the finite-sum problem of the form $min_x max_y f(x,y)triqangle frac1n sum_i=1n f_i(x,y)$.
We prove SPIDER-GDA could find an $epsilon$-optimal solution within $mathcal Oleft((n + sqrtn,kappa_xkappa_y2)log (1/epsilon)
arXiv Detail & Related papers (2023-07-29T02:26:31Z) - Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z) - Optimal SQ Lower Bounds for Learning Halfspaces with Massart Noise [9.378684220920562]
tightest statistical query (SQ) lower bounds for learnining halfspaces in the presence of Massart noise.
We show that for arbitrary $eta in [0,1/2]$ every SQ algorithm achieving misclassification error better than $eta$ requires queries of superpolynomial accuracy.
arXiv Detail & Related papers (2022-01-24T17:33:19Z) - Improved Regret Bounds for Online Submodular Maximization [10.089520556398575]
We consider an online optimization problem where at each step $tin[T]$, the algorithm chooses an action $x_t$ from the fixed convex and compact domain set $mathcalK$.
A utility function $f_t(cdot)$ is then revealed and the algorithm receives the payoff $f_t(x_t)$.
arXiv Detail & Related papers (2021-06-15T02:05:35Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - $Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z) - Adaptive Online Learning with Varying Norms [45.11667443216861]
We provide an online convex optimization algorithm that outputs points $w_t$ in some domain $W$.
We apply this result to obtain new "full-matrix"-style regret bounds.
arXiv Detail & Related papers (2020-02-10T17:22:08Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.