Related papers: Logarithmic Regret in Multisecretary and Online Linear Programs with Continuous Valuations

Logarithmic Regret in Multisecretary and Online Linear Programs with Continuous Valuations

URL: http://arxiv.org/abs/1912.08917v6
Date: Mon, 28 Aug 2023 16:46:50 GMT
Title: Logarithmic Regret in Multisecretary and Online Linear Programs with Continuous Valuations
Authors: Robert L. Bray
Abstract summary: I study how the shadow prices of a linear program that allocates an endowment of $nbeta in mathbbRm$ resources to $n$ customers behave as $n rightarrow infty$. I use these results to prove that the expected regret in citesLi 2019b online linear program is $Theta(log n)$, both when the customer variable distribution is known upfront and must be learned on the fly.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: I study how the shadow prices of a linear program that allocates an endowment of $n\beta \in \mathbb{R}^{m}$ resources to $n$ customers behave as $n \rightarrow \infty$. I show the shadow prices (i) adhere to a concentration of measure, (ii) converge to a multivariate normal under central-limit-theorem scaling, and (iii) have a variance that decreases like $\Theta(1/n)$. I use these results to prove that the expected regret in \cites{Li2019b} online linear program is $\Theta(\log n)$, both when the customer variable distribution is known upfront and must be learned on the fly. I thus tighten \citeauthors{Li2019b} upper bound from $O(\log n \log \log n)$ to $O(\log n)$, and extend \cites{Lueker1995} $\Omega(\log n)$ lower bound to the multi-dimensional setting. I illustrate my new techniques with a simple analysis of \cites{Arlotto2019} multisecretary problem.

Related papers

High-Dimensional Calibration from Swap Regret [40.9736612423411]
We study the online calibration of multi-dimensional forecasts over an arbitrary convex set $mathcalP subset mathbbRd$.<n>We show that if it is possible to guarantee $O(sqrtrho T)$ worst-case regret after $T$ rounds, it is possible to obtain $epsilon$-calibrated forecasts after $T = exp(logd/epsilon2).
arXiv Detail & Related papers (2025-05-27T17:31:47Z)
Near-Optimal Time-Sparsity Trade-Offs for Solving Noisy Linear Equations [17.957489763446496]
We present a-time reduction from solving noisy linear equations over $mathbbZ/qmathbbZ$ in dimension $Theta. We deduce the hardness of sparse problems from their dense counterparts.
arXiv Detail & Related papers (2024-11-19T13:53:43Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$. As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
On the Minimax Regret for Online Learning with Feedback Graphs [5.721380617450645]
We improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. Our improved upper bound $mathcalObigl(sqrtalpha T(ln K)/(lnalpha)bigr)$ holds for any $alpha$ and matches the lower bounds for bandits and experts.
arXiv Detail & Related papers (2023-05-24T17:40:57Z)
A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee [16.409210914237086]
Given a matrix $Ain mathbbRntimes d$ and a tensor $bin mathbbRn$, we consider the regression problem with $ell_infty$ guarantees. We show that in order to obtain such $ell_infty$ guarantee for $ell$ regression, one has to use sketching matrices that are dense. We also develop a novel analytical framework for $ell_infty$ guarantee regression that utilizes the Oblivious Coordinate-wise Embedding (OCE) property
arXiv Detail & Related papers (2023-02-01T05:22:40Z)
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
We study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. We design a computational efficient algorithm to achieve near-optimal regret of $tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot) hides logarithmic terms of $(S,A,H,K)$ in $K$ episodes. Our technical contribution are two-fold: 1) a near-optimal design scheme to explore
arXiv Detail & Related papers (2022-10-15T09:22:22Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model. We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z)
Nearly Horizon-Free Offline Reinforcement Learning [97.36751930393245]
We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes with $S$ states, $A$ actions and planning horizon $H$. We obtain the first set of nearly $H$-free sample complexity bounds for evaluation and planning using the empirical MDPs.
arXiv Detail & Related papers (2021-03-25T18:52:17Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure. We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z)
Logistic Regression Regret: What's the Catch? [3.7311680121118345]
We derive lower bounds with logarithmic regret under $L_infty$ constraints on the parameters. For $L$ constraints, it is shown that for large enough $d$, the regret remains linear in $d$ but no longer logarithmic in $T$.
arXiv Detail & Related papers (2020-02-07T18:36:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.