Optimal Contextual Pricing and Extensions
- URL: http://arxiv.org/abs/2003.01703v3
- Date: Tue, 23 Feb 2021 23:33:43 GMT
- Title: Optimal Contextual Pricing and Extensions
- Authors: Allen Liu, Renato Paes Leme, Jon Schneider
- Abstract summary: We give a poly-time algorithm for contextual pricing with $O(d log log T)$ regret which matches the $Omega(d log T)$ lower bound up to the $d log d$ additive factor.
These algorithms are based on a novel technique of bounding the value of the Steiner intrinsic of a convex region at various scales.
- Score: 32.152826900874075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the contextual pricing problem a seller repeatedly obtains products
described by an adversarially chosen feature vector in $\mathbb{R}^d$ and only
observes the purchasing decisions of a buyer with a fixed but unknown linear
valuation over the products. The regret measures the difference between the
revenue the seller could have obtained knowing the buyer valuation and what can
be obtained by the learning algorithm.
We give a poly-time algorithm for contextual pricing with $O(d \log \log T +
d \log d)$ regret which matches the $\Omega(d \log \log T)$ lower bound up to
the $d \log d$ additive factor. If we replace pricing loss by the symmetric
loss, we obtain an algorithm with nearly optimal regret of $O(d \log d)$
matching the $\Omega(d)$ lower bound up to $\log d$. These algorithms are based
on a novel technique of bounding the value of the Steiner polynomial of a
convex region at various scales. The Steiner polynomial is a degree $d$
polynomial with intrinsic volumes as the coefficients.
We also study a generalized version of contextual search where the hidden
linear function over the Euclidean space is replaced by a hidden function $f :
\mathcal{X} \rightarrow \mathcal{Y}$ in a certain hypothesis class
$\mathcal{H}$. We provide a generic algorithm with $O(d^2)$ regret where $d$ is
the covering dimension of this class. This leads in particular to a
$\tilde{O}(s^2)$ regret algorithm for linear contextual search if the linear
function is guaranteed to be $s$-sparse. Finally we also extend our results to
the noisy feedback model, where each round our feedback is flipped with a fixed
probability $p < 1/2$.
Related papers
- Batched Stochastic Bandit for Nondegenerate Functions [8.015503209312786]
This paper studies batched bandit learning problems for nondegenerate functions.
We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally.
arXiv Detail & Related papers (2024-05-09T12:50:16Z) - Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space [10.292118864147097]
We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods.
Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $pi$ over $mathbbRd$ by a product measure $pistar$.
arXiv Detail & Related papers (2023-12-05T16:02:04Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model.
We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z) - Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems.
We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support.
We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z) - $Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.