Multiclass classification by sparse multinomial logistic regression
- URL: http://arxiv.org/abs/2003.01951v3
- Date: Thu, 19 Nov 2020 11:35:48 GMT
- Title: Multiclass classification by sparse multinomial logistic regression
- Authors: Felix Abramovich, Vadim Grinshtein and Tomer Levy
- Abstract summary: We consider high-dimensional multiclass classification by sparse multinomial logistic regression.
We propose a feature selection procedure based on penalized maximum likelihood with a complexity penalty.
We show that there exist two regimes corresponding to small and large number of classes.
- Score: 10.312968200748116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we consider high-dimensional multiclass classification by
sparse multinomial logistic regression. We propose first a feature selection
procedure based on penalized maximum likelihood with a complexity penalty on
the model size and derive the nonasymptotic bounds for misclassification excess
risk of the resulting classifier. We establish also their tightness by deriving
the corresponding minimax lower bounds. In particular, we show that there exist
two regimes corresponding to small and large number of classes. The bounds can
be reduced under the additional low noise condition. To find a penalized
maximum likelihood solution with a complexity penalty requires, however, a
combinatorial search over all possible models. To design a feature selection
procedure computationally feasible for high-dimensional data, we propose
multinomial logistic group Lasso and Slope classifiers and show that they also
achieve the minimax order.
Related papers
- Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Precise Asymptotic Generalization for Multiclass Classification with
Overparameterized Linear Models [4.093769373833101]
We resolve the conjecture posed in Subramanian et al.'22, where the number of data points, features, and classes all grow together.
Our new lower bounds are akin to an information-theoretic strong converse: they establish that the misclassification rate goes to 0 or 1ally.
The key to our tight analysis is a new variant of the Hanson-Wright inequality which is broadly useful for multiclass problems with sparse labels.
arXiv Detail & Related papers (2023-06-23T00:59:15Z) - Optimal PAC Bounds Without Uniform Convergence [11.125968799758436]
We provide optimal high probability risk bounds through a framework that surpasses the limitations of uniform convergence arguments.
Our framework converts the leave-one-out error of permutation invariant predictors into high probability risk bounds.
Specifically, our result shows that certain aggregations of one-inclusion graph algorithms are optimal.
arXiv Detail & Related papers (2023-04-18T17:57:31Z) - Oracle Inequalities for Model Selection in Offline Reinforcement
Learning [105.74139523696284]
We study the problem of model selection in offline RL with value function approximation.
We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal inequalities up to logarithmic factors.
We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.
arXiv Detail & Related papers (2022-11-03T17:32:34Z) - Generalization Error Bounds for Multiclass Sparse Linear Classifiers [7.360807642941714]
We consider high-dimensional multiclass classification by sparse multinomial logistic regression.
We propose a computationally feasible feature selection procedure based on penalized maximum likelihood.
In particular, we consider global sparsity, double row-wise sparsity, and low-rank sparsity.
arXiv Detail & Related papers (2022-04-13T09:25:03Z) - Set-valued prediction in hierarchical classification with constrained
representation complexity [4.258263831866309]
We focus on hierarchical multi-class classification problems, where valid sets correspond to internal nodes of the hierarchy.
We propose three methods and evaluate them on benchmark datasets.
arXiv Detail & Related papers (2022-03-13T15:13:19Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward.
For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Modelling High-Dimensional Categorical Data Using Nonconvex Fusion
Penalties [7.262048441360131]
Our estimator, called SCOPE, fuses levels together by making their corresponding coefficients exactly equal.
We provide an algorithm for exact and efficient clustering within a non-dimensional block of variables.
arXiv Detail & Related papers (2020-02-28T09:20:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.