Linear Algorithms for Nonparametric Multiclass Probability Estimation
- URL: http://arxiv.org/abs/2205.12460v1
- Date: Wed, 25 May 2022 03:15:22 GMT
- Title: Linear Algorithms for Nonparametric Multiclass Probability Estimation
- Authors: Liyun Zeng, Hao Helen Zhang
- Abstract summary: Support Vector Machines (wSVMs) have been developed to estimate class probabilities through ensemble learning.
We propose two new learning schemes, the baseline learning and the One-vs. All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiclass probability estimation is the problem of estimating conditional
probabilities of a data point belonging to a class given its covariate
information. It has broad applications in statistical analysis and data
science. Recently a class of weighted Support Vector Machines (wSVMs) have been
developed to estimate class probabilities through ensemble learning for
$K$-class problems (Wang, Shen and Liu, 2008; Wang, Zhang and Wu, 2019), where
$K$ is the number of classes. The estimators are robust and achieve high
accuracy for probability estimation, but their learning is implemented through
pairwise coupling, which demand polynomial time in $K$. In this paper, we
propose two new learning schemes, the baseline learning and the One-vs-All
(OVA) learning, to further improve wSVMs in terms of computational efficiency
and estimation accuracy. In particular, the baseline learning has optimal
computational complexity in the sense that it is linear in $K$. The resulting
estimators are distribution-free and shown to be consistent. We further conduct
extensive numerical experiments to demonstrate finite sample performance.
Related papers
- Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Collaborative Learning with Different Labeling Functions [7.228285747845779]
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions.
We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible.
arXiv Detail & Related papers (2024-02-16T04:32:22Z) - Sparse Learning and Class Probability Estimation with Weighted Support
Vector Machines [1.3597551064547502]
weighted Support Vector Machines (wSVMs) have shown great values in robustly predicting the class probability and classification for various problems with high accuracy.
We propose novel wSVMs frameworks that incorporate automatic variable selection with accurate probability estimation for sparse learning problems.
The proposed wSVMs-based sparse learning methods have wide applications and can be further extended to $K$-class problems through ensemble learning.
arXiv Detail & Related papers (2023-12-17T06:12:33Z) - Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Robust Methods for High-Dimensional Linear Learning [0.0]
We propose statistically robust and computationally efficient linear learning methods in the high-dimensional batch setting.
We instantiate our framework on several applications including vanilla sparse, group-sparse and low-rank matrix recovery.
For vanilla $s$-sparsity, we are able to reach the $slog (d)/n$ rate under heavy-tails and $eta$-corruption.
arXiv Detail & Related papers (2022-08-10T17:00:41Z) - Testing distributional assumptions of learning algorithms [5.204779946147061]
We study the design of tester-learner pairs $(mathcalA,mathcalT)$.
We show that if the distribution on examples in the data passes the tester $mathcalT$ then one can safely trust the output of the agnostic $mathcalA$ on the data.
arXiv Detail & Related papers (2022-04-14T19:10:53Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.