Contextual Recommendations and Low-Regret Cutting-Plane Algorithms
- URL: http://arxiv.org/abs/2106.04819v1
- Date: Wed, 9 Jun 2021 05:39:05 GMT
- Title: Contextual Recommendations and Low-Regret Cutting-Plane Algorithms
- Authors: Sreenivas Gollapudi, Guru Guruganesh, Kostas Kollias, Pasin
Manurangsi, Renato Paes Leme, Jon Schneider
- Abstract summary: We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems.
We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
- Score: 49.91214213074933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the following variant of contextual linear bandits motivated by
routing applications in navigational engines and recommendation systems. We
wish to learn a hidden $d$-dimensional value $w^*$. Every round, we are
presented with a subset $\mathcal{X}_t \subseteq \mathbb{R}^d$ of possible
actions. If we choose (i.e. recommend to the user) action $x_t$, we obtain
utility $\langle x_t, w^* \rangle$ but only learn the identity of the best
action $\arg\max_{x \in \mathcal{X}_t} \langle x, w^* \rangle$. We design
algorithms for this problem which achieve regret $O(d\log T)$ and $\exp(O(d
\log d))$. To accomplish this, we design novel cutting-plane algorithms with
low "regret" -- the total distance between the true point $w^*$ and the
hyperplanes the separation oracle returns. We also consider the variant where
we are allowed to provide a list of several recommendations. In this variant,
we give an algorithm with $O(d^2 \log d)$ regret and list size
$\mathrm{poly}(d)$. Finally, we construct nearly tight algorithms for a weaker
variant of this problem where the learner only learns the identity of an action
that is better than the recommendation. Our results rely on new algorithmic
techniques in convex geometry (including a variant of Steiner's formula for the
centroid of a convex set) which may be of independent interest.
Related papers
- A simple and improved algorithm for noisy, convex, zeroth-order optimisation [59.51990161522328]
We construct an algorithm that returns a point $hat xin barmathcal X$ such that $f(hat x)$ is as small as possible.
We prove that this method is such that the $f(hat x) - min_xin barmathcal X f(x)$ is of smaller order than $d2/sqrtn$ up to poly-logarithmic terms.
arXiv Detail & Related papers (2024-06-26T18:19:10Z) - Dueling Optimization with a Monotone Adversary [35.850072415395395]
We study the problem of dueling optimization with a monotone adversary, which is a generalization of (noiseless) dueling convex optimization.
The goal is to design an online algorithm to find a minimizer $mathbfx*$ for a function $fcolon X to mathbbRd.
arXiv Detail & Related papers (2023-11-18T23:55:59Z) - Memory-Constrained Algorithms for Convex Optimization via Recursive
Cutting-Planes [23.94542304111204]
First class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $epsilonleq 1/sqrt d$.
In the regime $epsilon leq d-Omega(d)$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
arXiv Detail & Related papers (2023-06-16T17:00:51Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model.
We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Optimal Contextual Pricing and Extensions [32.152826900874075]
We give a poly-time algorithm for contextual pricing with $O(d log log T)$ regret which matches the $Omega(d log T)$ lower bound up to the $d log d$ additive factor.
These algorithms are based on a novel technique of bounding the value of the Steiner intrinsic of a convex region at various scales.
arXiv Detail & Related papers (2020-03-03T18:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.