$p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets
- URL: http://arxiv.org/abs/2203.13568v1
- Date: Fri, 25 Mar 2022 10:54:41 GMT
- Title: $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets
- Authors: Alexander Munteanu, Simon Omlor, Christian Peters
- Abstract summary: We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
- Score: 74.37849422071206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the $p$-generalized probit regression model, which is a generalized
linear model for binary responses. It extends the standard probit model by
replacing its link function, the standard normal cdf, by a $p$-generalized
normal distribution for $p\in[1, \infty)$. The $p$-generalized normal
distributions \citep{Sub23} are of special interest in statistical modeling
because they fit much more flexibly to data. Their tail behavior can be
controlled by choice of the parameter $p$, which influences the model's
sensitivity to outliers. Special cases include the Laplace, the Gaussian, and
the uniform distributions. We further show how the maximum likelihood estimator
for $p$-generalized probit regression can be approximated efficiently up to a
factor of $(1+\varepsilon)$ on large data by combining sketching techniques
with importance subsampling to obtain a small data summary called coreset.
Related papers
- Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.
We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.
Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Universality laws for Gaussian mixtures in generalized linear models [22.154969876570238]
We investigate the joint statistics of the family of generalized linear estimators $(Theta_1, dots, Theta_M)$.
This allow us to prove the universality of different quantities of interest, such as the training and generalization errors.
We discuss the applications of our results to different machine learning tasks of interest, such as ensembling and uncertainty.
arXiv Detail & Related papers (2023-02-17T15:16:06Z) - Dimension free ridge regression [10.434481202633458]
We revisit ridge regression on i.i.d. data in terms of the bias and variance of ridge regression in terms of the bias and variance of an equivalent' sequence model.
As a new application, we obtain a completely explicit and sharp characterization of ridge regression for Hilbert covariates with regularly varying spectrum.
arXiv Detail & Related papers (2022-10-16T16:01:05Z) - Variational Inference for Bayesian Bridge Regression [0.0]
We study the implementation of Automatic Differentiation Variational inference (ADVI) for Bayesian inference on regression models with bridge penalization.
The bridge approach uses $ell_alpha$ norm, with $alpha in (0, +infty)$ to define a penalization on large values of the regression coefficients.
We illustrate the approach on non-parametric regression models with B-splines, although the method works seamlessly for other choices of basis functions.
arXiv Detail & Related papers (2022-05-19T12:29:09Z) - On the Generative Utility of Cyclic Conditionals [103.1624347008042]
We study whether and how can we model a joint distribution $p(x,z)$ using two conditional models $p(x|z)$ that form a cycle.
We propose the CyGen framework for cyclic-conditional generative modeling, including methods to enforce compatibility and use the determined distribution to fit and generate data.
arXiv Detail & Related papers (2021-06-30T10:23:45Z) - Estimating Stochastic Linear Combination of Non-linear Regressions
Efficiently and Scalably [23.372021234032363]
We show that when the sub-sample sizes are large then the estimation errors will be sacrificed by too much.
To the best of our knowledge, this is the first work that and guarantees for the lineartext+Stochasticity model.
arXiv Detail & Related papers (2020-10-19T07:15:38Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.