Related papers: No-Regret M${}^{\natural}$-Concave Function Maximization: Stochastic Bandit Algorithms and Hardness of Adversarial Full-Information Setting

No-Regret M${}^{\natural}$-Concave Function Maximization: Stochastic Bandit Algorithms and Hardness of Adversarial Full-Information Setting

URL: http://arxiv.org/abs/2405.12439v3
Date: Tue, 26 Aug 2025 02:05:31 GMT
Title: No-Regret M${}^{\natural}$-Concave Function Maximization: Stochastic Bandit Algorithms and Hardness of Adversarial Full-Information Setting
Authors: Taihei Oki, Shinsaku Sakaue,
Abstract summary: We study online M$natural$-concave function problems, which are interactive versions of the problem studied by Murota and Shioura (1999).<n>For the bandit setting, we present $O(T-1/2)$-simple regret and $O(T2/3)$-regret algorithms under $T$ times access to noisy value oracles of M$natural$-concave functions.<n>We prove that even with full-information feedback, no algorithms that run in time per round can achieve $O(T1-c)$ regret for any constant $c
Score: 27.973926244529267
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: M${}^{\natural}$-concave functions, a.k.a. gross substitute valuation functions, play a fundamental role in many fields, including discrete mathematics and economics. In practice, perfect knowledge of M${}^{\natural}$-concave functions is often unavailable a priori, and we can optimize them only interactively based on some feedback. Motivated by such situations, we study online M${}^{\natural}$-concave function maximization problems, which are interactive versions of the problem studied by Murota and Shioura (1999). For the stochastic bandit setting, we present $O(T^{-1/2})$-simple regret and $O(T^{2/3})$-regret algorithms under $T$ times access to unbiased noisy value oracles of M${}^{\natural}$-concave functions. A key to proving these results is the robustness of the greedy algorithm to local errors in M${}^{\natural}$-concave function maximization, which is one of our main technical results. While we obtain those positive results for the stochastic setting, another main result of our work is an impossibility in the adversarial setting. We prove that, even with full-information feedback, no algorithms that run in polynomial time per round can achieve $O(T^{1-c})$ regret for any constant $c > 0$. Our proof is based on a reduction from the matroid intersection problem for three matroids, which would be a novel approach to establishing the hardness in online learning.

Related papers

Adaptivity and Universality: Problem-dependent Universal Regret for Online Convex Optimization [64.88607416000376]
We introduce UniGrad, a novel approach that achieves both universality and adaptivity, with two distinct realizations: UniGrad.Correct and UniGrad.Bregman.<n>Both methods achieve universal regret guarantees that adapt to gradient variation, simultaneously attaining $mathcalO(log V_T)$ regret for strongly convex functions and $mathcalO(d log V_T)$ regret for exp-concave functions.
arXiv Detail & Related papers (2025-11-25T05:23:10Z)
Dueling Optimization with a Monotone Adversary [35.850072415395395]
We study the problem of dueling optimization with a monotone adversary, which is a generalization of (noiseless) dueling convex optimization. The goal is to design an online algorithm to find a minimizer $mathbfx*$ for a function $fcolon X to mathbbRd.
arXiv Detail & Related papers (2023-11-18T23:55:59Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Agnostic proper learning of monotone functions: beyond the black-box correction barrier [6.47243430672461]
Given $2tildeO(sqrtn/varepsilon)$ uniformly random examples of an unknown function $f:pm 1n rightarrow pm 1$, our algorithm outputs a hypothesis $g:pm 1n rightarrow pm 1$ that is monotone. We also give an algorithm for estimating up to additive error $varepsilon$ the distance of an unknown function $f$ to monotone using a run-time of $2tildeO(sqrt
arXiv Detail & Related papers (2023-04-05T18:52:10Z)
On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function. We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z)
Improved Regret Bounds for Online Submodular Maximization [10.089520556398575]
We consider an online optimization problem where at each step $tin[T]$, the algorithm chooses an action $x_t$ from the fixed convex and compact domain set $mathcalK$. A utility function $f_t(cdot)$ is then revealed and the algorithm receives the payoff $f_t(x_t)$.
arXiv Detail & Related papers (2021-06-15T02:05:35Z)
Efficient Online-Bandit Strategies for Minimax Learning Problems [21.300877551771197]
Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning or minimization with non-standard aggregated losses. More specifically, these problems are convex-linear problems where the learning is carried out over the model parameters $winmathcalW$ and over the empirical distribution $pinmathcalK$ of the training set. To design efficient methods, we let an online learning algorithm play against a (combinatorial) bandit algorithm.
arXiv Detail & Related papers (2021-05-28T16:01:42Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$. Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z)
Empirical Risk Minimization in the Non-interactive Local Model of Differential Privacy [26.69391745812235]
We study the Empirical Risk Minimization (ERM) problem in the noninteractive Local Differential Privacy (LDP) model. Previous research indicates that the sample complexity, to achieve error $alpha, needs to be depending on dimensionality $p$ for general loss functions.
arXiv Detail & Related papers (2020-11-11T17:48:00Z)
Maximizing Determinants under Matroid Constraints [69.25768526213689]
We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized. This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
arXiv Detail & Related papers (2020-04-16T19:16:38Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors. We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems [86.92205445270427]
We consider non-con minimax problems, $min_mathbfx max_mathhidoty f(mathbfdoty)$ efficiently.
arXiv Detail & Related papers (2019-06-02T03:03:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.