Pseudonorm Approachability and Applications to Regret Minimization
- URL: http://arxiv.org/abs/2302.01517v1
- Date: Fri, 3 Feb 2023 03:19:14 GMT
- Title: Pseudonorm Approachability and Applications to Regret Minimization
- Authors: Christoph Dann, Yishay Mansour, Mehryar Mohri, Jon Schneider,
Balasubramanian Sivan
- Abstract summary: We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
- Score: 73.54127663296906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Blackwell's celebrated approachability theory provides a general framework
for a variety of learning problems, including regret minimization. However,
Blackwell's proof and implicit algorithm measure approachability using the
$\ell_2$ (Euclidean) distance. We argue that in many applications such as
regret minimization, it is more useful to study approachability under other
distance metrics, most commonly the $\ell_\infty$-metric. But, the time and
space complexity of the algorithms designed for $\ell_\infty$-approachability
depend on the dimension of the space of the vectorial payoffs, which is often
prohibitively large. Thus, we present a framework for converting
high-dimensional $\ell_\infty$-approachability problems to low-dimensional
pseudonorm approachability problems, thereby resolving such issues. We first
show that the $\ell_\infty$-distance between the average payoff and the
approachability set can be equivalently defined as a pseudodistance between a
lower-dimensional average vector payoff and a new convex set we define. Next,
we develop an algorithmic theory of pseudonorm approachability, analogous to
previous work on approachability for $\ell_2$ and other norms, showing that it
can be achieved via online linear optimization (OLO) over a convex set given by
the Fenchel dual of the unit pseudonorm ball. We then use that to show, modulo
mild normalization assumptions, that there exists an
$\ell_\infty$-approachability algorithm whose convergence is independent of the
dimension of the original vectorial payoff. We further show that that algorithm
admits a polynomial-time complexity, assuming that the original
$\ell_\infty$-distance can be computed efficiently. We also give an
$\ell_\infty$-approachability algorithm whose convergence is logarithmic in
that dimension using an FTRL algorithm with a maximum-entropy regularizer.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Projection-free Online Exp-concave Optimization [21.30065439295409]
We present an LOO-based ONS-style algorithm, which using overall $O(T)$ calls to a LOO, guarantees in worst case regret bounded by $widetildeO(n2/3T2/3)$.
Our algorithm is most interesting in an important and plausible low-dimensional data scenario.
arXiv Detail & Related papers (2023-02-09T18:58:05Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual
Linear Bandit [10.604939762790517]
We prove a minimax lower bound, $mathcalObig((log d)fracalpha+12Tfrac1-alpha2+log Tbig)$, for the cumulative regret.
Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy.
arXiv Detail & Related papers (2021-09-23T19:35:38Z) - Empirical Risk Minimization in the Non-interactive Local Model of
Differential Privacy [26.69391745812235]
We study the Empirical Risk Minimization (ERM) problem in the noninteractive Local Differential Privacy (LDP) model.
Previous research indicates that the sample complexity, to achieve error $alpha, needs to be depending on dimensionality $p$ for general loss functions.
arXiv Detail & Related papers (2020-11-11T17:48:00Z) - Differentiable Linear Bandit Algorithm [6.849358422233866]
Upper Confidence Bound is arguably the most commonly used method for linear multi-arm bandit problems.
We introduce a gradient estimator, which allows the confidence bound to be learned via gradient ascent.
We show that the proposed algorithm achieves a $tildemathcalO(hatbetasqrtdT)$ upper bound of $T$-round regret, where $d$ is the dimension of arm features and $hatbeta$ is the learned size of confidence bound.
arXiv Detail & Related papers (2020-06-04T16:43:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.