Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank
Preference Bandits
- URL: http://arxiv.org/abs/2202.11795v1
- Date: Wed, 23 Feb 2022 21:34:20 GMT
- Title: Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank
Preference Bandits
- Authors: Suprovat Ghoshal and Aadirupa Saha
- Abstract summary: We investigate whether models with a simple correlation structure, e.g. low rank, can result in faster learning rates.
In particular, we introduce a new class of emphBlock-Rank based RUM model, where the best item is shown to be $(epsilon,delta)$-PAC learnable with only $O(r epsilon-2 log(n/delta))$ samples.
- Score: 21.70169149901781
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We introduce the \emph{Correlated Preference Bandits} problem with random
utility-based choice models (RUMs), where the goal is to identify the best item
from a given pool of $n$ items through online subsetwise preference feedback.
We investigate whether models with a simple correlation structure, e.g. low
rank, can result in faster learning rates. While we show that the problem can
be impossible to solve for the general `low rank' choice models, faster
learning rates can be attained assuming more structured item correlations. In
particular, we introduce a new class of \emph{Block-Rank} based RUM model,
where the best item is shown to be $(\epsilon,\delta)$-PAC learnable with only
$O(r \epsilon^{-2} \log(n/\delta))$ samples. This improves on the standard
sample complexity bound of $\tilde{O}(n\epsilon^{-2} \log(1/\delta))$ known for
the usual learning algorithms which might not exploit the item-correlations ($r
\ll n$). We complement the above sample complexity with a matching lower bound
(up to logarithmic factors), justifying the tightness of our analysis.
Surprisingly, we also show a lower bound of
$\Omega(n\epsilon^{-2}\log(1/\delta))$ when the learner is forced to play just
duels instead of larger subsetwise queries. Further, we extend the results to a
more general `\emph{noisy Block-Rank}' model, which ensures robustness of our
techniques. Overall, our results justify the advantage of playing subsetwise
queries over pairwise preferences $(k=2)$, we show the latter provably fails to
exploit correlation.
Related papers
- Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct.
Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - UniRank: Unimodal Bandit Algorithm for Online Ranking [0.0]
We create an algorithm with an expected regret matching $O(fracLlog(L)Deltalog(T))$ with $2L$ players, $T$ and a minimum reward gap $Delta$.
Some experimental results finally show these theoretical results are corroborated in practice.
arXiv Detail & Related papers (2022-08-02T15:01:33Z) - Unimodal Mono-Partite Matching in a Bandit Setting [0.0]
We create an algorithm with an expected regret matching $O(fracLlog(L)Deltalog(T))$ with $2L$ players, $T$ and a minimum reward gap $Delta$.
Some experimental results finally show these theoretical results are corroborated in practice.
arXiv Detail & Related papers (2022-08-02T14:55:50Z) - Tight Bounds on the Hardness of Learning Simple Nonparametric Mixtures [9.053430799456587]
We study the problem of learning nonparametric distributions in a finite mixture.
We establish tight bounds on the sample complexity for learning the component distributions in such models.
arXiv Detail & Related papers (2022-03-28T23:53:48Z) - Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
We present the first line of algorithms that require only $widetildemathcalO((XA+YB)/varepsilon2)$ episodes of play to find an $varepsilon$-approximate Nash equilibrium in two-player zero-sum games.
This improves upon the best known sample complexity of $widetildemathcalO((X2A+Y2B)/varepsilon2)$ by a factor of $widetildemathcalO(maxX,
arXiv Detail & Related papers (2022-02-03T18:18:28Z) - Under-bagging Nearest Neighbors for Imbalanced Classification [63.026765294759876]
We propose an ensemble learning algorithm called textitunder-bagging $k$-NN (textitunder-bagging $k$-NN) for imbalanced classification problems.
arXiv Detail & Related papers (2021-09-01T14:10:38Z) - Structured Logconcave Sampling with a Restricted Gaussian Oracle [23.781520510778716]
We give algorithms for sampling several structured logconcave families to high accuracy.
We further develop a reduction framework, inspired by proximal point methods in convex optimization.
arXiv Detail & Related papers (2020-10-07T01:43:07Z) - An Optimal Elimination Algorithm for Learning a Best Arm [37.18327505953403]
We consider the classic problem of $(epsilon,delta)$-PAC learning a best arm.
We propose a new approach for $(epsilon,delta)$-PAC learning a best arm.
arXiv Detail & Related papers (2020-06-20T20:21:33Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.