Related papers: On the existence of the maximum likelihood estimate and convergence rate under gradient descent for multi-class logistic regression

On the existence of the maximum likelihood estimate and convergence rate under gradient descent for multi-class logistic regression

URL: http://arxiv.org/abs/2012.04576v5
Date: Wed, 8 May 2024 05:31:36 GMT
Title: On the existence of the maximum likelihood estimate and convergence rate under gradient descent for multi-class logistic regression
Authors: Dwight Nwaigwe, Marek Rychlik,
Abstract summary: We show that one method of ensuring its existence is by assigning positive probability to every class in the sample dataset. The notion of data separability is not needed, which is in contrast to the classical set up of multi-class logistic regression.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We revisit the problem of the existence of the maximum likelihood estimate for multi-class logistic regression. We show that one method of ensuring its existence is by assigning positive probability to every class in the sample dataset. The notion of data separability is not needed, which is in contrast to the classical set up of multi-class logistic regression in which each data sample belongs to one class. We also provide a general and constructive estimate of the convergence rate to the maximum likelihood estimate when gradient descent is used as the optimizer. Our estimate involves bounding the condition number of the Hessian of the maximum likelihood function. The approaches used in this article rely on a simple operator-theoretic framework.

Related papers

Finite-sample performance of the maximum likelihood estimator in logistic regression [3.7550827441501844]
We consider the predictive performance of the maximum likelihood estimator (MLE) for logistic regression. We obtain sharp non-asymptotic guarantees for the existence and excess logistic risk of the MLE.
arXiv Detail & Related papers (2024-11-04T14:50:15Z)
High-dimensional logistic regression with missing data: Imputation, regularization, and universality [7.167672851569787]
We study high-dimensional, ridge-regularized logistic regression. We provide exact characterizations of both the prediction error and the estimation error.
arXiv Detail & Related papers (2024-10-01T21:41:21Z)
A Provably Accurate Randomized Sampling Algorithm for Logistic Regression [2.7930955543692817]
We present a simple, randomized sampling-based algorithm for logistic regression problem. We prove that accurate approximations can be achieved with a sample whose size is much smaller than the total number of observations. Overall, our work sheds light on the potential of using randomized sampling approaches to efficiently approximate the estimated probabilities in logistic regression.
arXiv Detail & Related papers (2024-02-26T06:20:28Z)
Nonparametric logistic regression with deep learning [1.0589208420411012]
In the nonparametric logistic regression, the Kullback-Leibler divergence could diverge easily. Instead of analyzing the excess risk itself, it suffices to show the consistency of the maximum likelihood estimator. As an important application, we derive convergence rates of the NPMLE with fully connected deep neural networks.
arXiv Detail & Related papers (2024-01-23T04:31:49Z)
Resampled Confidence Regions with Exponential Shrinkage for the Regression Function of Binary Classification [0.0]
We build distribution-free confidence regions for the regression function for any user-chosen confidence level and any finite sample size based on a resampling test.<n>We prove the strong uniform consistency of a new empirical risk based approach for model classes with finite pseudo-dimensions and inverse Lipschitz parameterizations.<n>We also consider a k-nearest neighbors based method, for which we prove strong point boundswise on the probability of exclusion.
arXiv Detail & Related papers (2023-08-03T15:52:27Z)
Bayesian Hierarchical Models for Counterfactual Estimation [12.159830463756341]
We propose a probabilistic paradigm to estimate a diverse set of counterfactuals. We treat the perturbations as random variables endowed with prior distribution functions. A gradient based sampler with superior convergence characteristics efficiently computes the posterior samples.
arXiv Detail & Related papers (2023-01-21T00:21:11Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated. We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z)
Distributional Gradient Boosting Machines [77.34726150561087]
Our framework is based on XGBoost and LightGBM. We show that our framework achieves state-of-the-art forecast accuracy.
arXiv Detail & Related papers (2022-04-02T06:32:19Z)
Continuously Generalized Ordinal Regression for Linear and Deep Models [41.03778663275373]
Ordinal regression is a classification task where classes have an order and prediction error increases the further the predicted class is from the true class. We propose a new approach for modeling ordinal data that allows class-specific hyperplane slopes. Our method significantly outperforms the standard ordinal logistic model over a thorough set of ordinal regression benchmark datasets.
arXiv Detail & Related papers (2022-02-14T19:49:05Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner. We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z)
Two-step penalised logistic regression for multi-omic data with an application to cardiometabolic syndrome [62.997667081978825]
We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately. Our approach should be preferred if the goal is to select as many relevant predictors as possible. Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level.
arXiv Detail & Related papers (2020-08-01T10:36:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.