Nonparametric logistic regression with deep learning
- URL: http://arxiv.org/abs/2401.12482v2
- Date: Tue, 25 Feb 2025 05:20:22 GMT
- Title: Nonparametric logistic regression with deep learning
- Authors: Atsutomo Yara, Yoshikazu Terada,
- Abstract summary: In the nonparametric logistic regression, the Kullback-Leibler divergence could diverge easily.<n>Instead of analyzing the excess risk itself, it suffices to show the consistency of the maximum likelihood estimator.<n>As an important application, we derive convergence rates of the NPMLE with fully connected deep neural networks.
- Score: 1.0589208420411012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider the nonparametric logistic regression problem. In the logistic regression, we usually consider the maximum likelihood estimator, and the excess risk is the expectation of the Kullback-Leibler (KL) divergence between the true and estimated conditional class probabilities. However, in the nonparametric logistic regression, the KL divergence could diverge easily, and thus, the convergence of the excess risk is difficult to prove or does not hold. Several existing studies show the convergence of the KL divergence under strong assumptions. In most cases, our goal is to estimate the true conditional class probabilities. Thus, instead of analyzing the excess risk itself, it suffices to show the consistency of the maximum likelihood estimator in some suitable metric. In this paper, using a simple unified approach for analyzing the nonparametric maximum likelihood estimator (NPMLE), we directly derive convergence rates of the NPMLE in the Hellinger distance under mild assumptions. Although our results are similar to the results in some existing studies, we provide simple and more direct proofs for these results. As an important application, we derive convergence rates of the NPMLE with fully connected deep neural networks and show that the derived rate nearly achieves the minimax optimal rate.
Related papers
- Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - On the Variance, Admissibility, and Stability of Empirical Risk
Minimization [80.26309576810844]
Empirical Risk Minimization (ERM) with squared loss may attain minimax suboptimal error rates.
We show that under mild assumptions, the suboptimality of ERM must be due to large bias rather than variance.
We also show that our estimates imply stability of ERM, complementing the main result of Caponnetto and Rakhlin (2006) for non-Donsker classes.
arXiv Detail & Related papers (2023-05-29T15:25:48Z) - Quantized Low-Rank Multivariate Regression with Random Dithering [23.81618208119832]
Low-rank multivariate regression (LRMR) is an important statistical learning model.
We focus on the estimation of the underlying coefficient matrix.
We employ uniform quantization with random dithering, i.e., we add appropriate random noise to the data before quantization.
We propose the constrained Lasso and regularized Lasso estimators, and derive the non-asymptotic error bounds.
arXiv Detail & Related papers (2023-02-22T08:14:24Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Optimally tackling covariate shift in RKHS-based nonparametric
regression [43.457497490211985]
We show that a kernel ridge regression estimator with a carefully chosen regularization parameter is minimax rate-optimal.
We also show that a naive estimator, which minimizes the empirical risk over the function class, is strictly sub-optimal.
We propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios.
arXiv Detail & Related papers (2022-05-06T02:33:24Z) - Distribution Regression with Sliced Wasserstein Kernels [45.916342378789174]
We propose the first OT-based estimator for distribution regression.
We study the theoretical properties of a kernel ridge regression estimator based on such representation.
arXiv Detail & Related papers (2022-02-08T15:21:56Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Distributionally Robust Parametric Maximum Likelihood Estimation [13.09499764232737]
We propose a distributionally robust maximum likelihood estimator that minimizes the worst-case expected log-loss uniformly over a parametric nominal distribution.
Our novel robust estimator also enjoys statistical consistency and delivers promising empirical results in both regression and classification tasks.
arXiv Detail & Related papers (2020-10-11T19:05:49Z) - Robust regression with covariate filtering: Heavy tails and adversarial
contamination [6.939768185086755]
We show how to modify the Huber regression, least trimmed squares, and least absolute deviation estimators to obtain estimators simultaneously computationally and statistically efficient in the stronger contamination model.
We show that the Huber regression estimator achieves near-optimal error rates in this setting, whereas the least trimmed squares and least absolute deviation estimators can be made to achieve near-optimal error after applying a postprocessing step.
arXiv Detail & Related papers (2020-09-27T22:48:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.