A Multiclass Classification Approach to Label Ranking
- URL: http://arxiv.org/abs/2002.09420v1
- Date: Fri, 21 Feb 2020 17:12:43 GMT
- Title: A Multiclass Classification Approach to Label Ranking
- Authors: Stephan Cl\'emen\c{c}on, Robin Vogel
- Abstract summary: In multiclass classification, the goal is to learn how to predict a random label $Y$, valued in $mathcalY=1,; ldots,; K $ with $Kgeq 3$.
This article is devoted to the analysis of this statistical learning problem, halfway between multiclass classification and posterior probability estimation.
- Score: 2.6905021039717987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multiclass classification, the goal is to learn how to predict a random
label $Y$, valued in $\mathcal{Y}=\{1,\; \ldots,\; K \}$ with $K\geq 3$, based
upon observing a r.v. $X$, taking its values in $\mathbb{R}^q$ with $q\geq 1$
say, by means of a classification rule $g:\mathbb{R}^q\to \mathcal{Y}$ with
minimum probability of error $\mathbb{P}\{Y\neq g(X) \}$. However, in a wide
variety of situations, the task targeted may be more ambitious, consisting in
sorting all the possible label values $y$ that may be assigned to $X$ by
decreasing order of the posterior probability $\eta_y(X)=\mathbb{P}\{Y=y \mid X
\}$. This article is devoted to the analysis of this statistical learning
problem, halfway between multiclass classification and posterior probability
estimation (regression) and referred to as label ranking here. We highlight the
fact that it can be viewed as a specific variant of ranking median regression
(RMR), where, rather than observing a random permutation $\Sigma$ assigned to
the input vector $X$ and drawn from a Bradley-Terry-Luce-Plackett model with
conditional preference vector $(\eta_1(X),\; \ldots,\; \eta_K(X))$, the sole
information available for training a label ranking rule is the label $Y$ ranked
on top, namely $\Sigma^{-1}(1)$. Inspired by recent results in RMR, we prove
that under appropriate noise conditions, the One-Versus-One (OVO) approach to
multiclassification yields, as a by-product, an optimal ranking of the labels
with overwhelming probability. Beyond theoretical guarantees, the relevance of
the approach to label ranking promoted in this article is supported by
experimental results.
Related papers
- Optimal level set estimation for non-parametric tournament and crowdsourcing problems [49.75262185577198]
Motivated by crowdsourcing, we consider a problem where we partially observe the correctness of the answers of $n$ experts on $d$ questions.
In this paper, we assume that the matrix $M$ containing the probability that expert $i$ answers correctly to question $j$ is bi-isotonic up to a permutation of it rows and columns.
We construct an efficient-time algorithm that turns out to be minimax optimal for this classification problem.
arXiv Detail & Related papers (2024-08-27T18:28:31Z) - The Real Price of Bandit Information in Multiclass Classification [73.17969992976501]
We revisit the classical problem of multiclass classification with bandit feedback.
We present a new bandit classification algorithm that guarantees regret $smashwidetildeO(|H|+sqrtT)$.
arXiv Detail & Related papers (2024-05-16T12:11:09Z) - One-Bit Quantization and Sparsification for Multiclass Linear Classification with Strong Regularization [18.427215139020625]
We show that the best classification is achieved when $f(cdot) = |cdot|2$ and $lambda to infty$.
It is often possible to find sparse and one-bit solutions that perform almost as well as one corresponding to $f(cdot) = |cdot|_infty$ in the large $lambda$ regime.
arXiv Detail & Related papers (2024-02-16T06:39:40Z) - Identification of Mixtures of Discrete Product Distributions in
Near-Optimal Sample and Time Complexity [6.812247730094931]
We show, for any $ngeq 2k-1$, how to achieve sample complexity and run-time complexity $(1/zeta)O(k)$.
We also extend the known lower bound of $eOmega(k)$ to match our upper bound across a broad range of $zeta$.
arXiv Detail & Related papers (2023-09-25T09:50:15Z) - Self-Directed Linear Classification [50.659479930171585]
In online classification, a learner aims to predict their labels in an online fashion so as to minimize the total number of mistakes.
Here we study the power of choosing the prediction order and establish the first strong separation between worst-order and random-order learning.
arXiv Detail & Related papers (2023-08-06T15:38:44Z) - Repeated Observations for Classification [0.2676349883103404]
We study the problem nonparametric classification with repeated observations.
In the analysis, we investigate particular models like robust detection by nominal densities, prototype classification, linear transformation, linear classification, scaling.
arXiv Detail & Related papers (2023-07-19T10:50:36Z) - Classification Under Ambiguity: When Is Average-K Better Than Top-K? [1.7156052308952854]
A common alternative, referred to as top-$K$ classification, is to choose some number $K$ and to return the $K$ labels with the highest scores.
This paper formally characterizes the ambiguity profile when average-$K$ classification can achieve a lower error rate than a fixed top-$K$ classification.
arXiv Detail & Related papers (2021-12-16T12:58:07Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Self-training Converts Weak Learners to Strong Learners in Mixture
Models [86.7137362125503]
We show that a pseudolabeler $boldsymbolbeta_mathrmpl$ can achieve classification error at most $C_mathrmerr$.
We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbolbeta_mathrmpl$ with classification error $C_mathrmerr$ using only $O(d)$ labeled examples.
arXiv Detail & Related papers (2021-06-25T17:59:16Z) - Statistical Query Lower Bounds for List-Decodable Linear Regression [55.06171096484622]
We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples.
Our main result is a Statistical Query (SQ) lower bound of $dmathrmpoly (1/alpha)$ for this problem.
arXiv Detail & Related papers (2021-06-17T17:45:21Z) - Semi-supervised Active Regression [21.51757844385258]
This paper studies the use of partially labelled, potentially biased data for learning tasks.
The learner has access to a dataset $X in mathbbRd | X min_beta in mathbbRd | X beta - Y |2 end2 equation while making few additional label queries.
arXiv Detail & Related papers (2021-06-12T03:28:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.