Inference with non-differentiable surrogate loss in a general high-dimensional classification framework
- URL: http://arxiv.org/abs/2405.11723v1
- Date: Mon, 20 May 2024 01:50:35 GMT
- Title: Inference with non-differentiable surrogate loss in a general high-dimensional classification framework
- Authors: Muxuan Liang, Yang Ning, Maureen A Smith, Ying-Qi Zhao,
- Abstract summary: We propose a kernel-smoothed decorrelated score to construct hypothesis testing and interval estimations.
Specifically, we adopt kernel approximations to smooth the discontinuous gradient near discontinuity points.
We establish the limiting distribution of the kernel-smoothed decorrelated score and its cross-fitted version in a high-dimensional setup.
- Score: 4.792322531593389
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Penalized empirical risk minimization with a surrogate loss function is often used to derive a high-dimensional linear decision rule in classification problems. Although much of the literature focuses on the generalization error, there is a lack of valid inference procedures to identify the driving factors of the estimated decision rule, especially when the surrogate loss is non-differentiable. In this work, we propose a kernel-smoothed decorrelated score to construct hypothesis testing and interval estimations for the linear decision rule estimated using a piece-wise linear surrogate loss, which has a discontinuous gradient and non-regular Hessian. Specifically, we adopt kernel approximations to smooth the discontinuous gradient near discontinuity points and approximate the non-regular Hessian of the surrogate loss. In applications where additional nuisance parameters are involved, we propose a novel cross-fitted version to accommodate flexible nuisance estimates and kernel approximations. We establish the limiting distribution of the kernel-smoothed decorrelated score and its cross-fitted version in a high-dimensional setup. Simulation and real data analysis are conducted to demonstrate the validity and superiority of the proposed method.
Related papers
- Refined Risk Bounds for Unbounded Losses via Transductive Priors [58.967816314671296]
We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression.
Our key tools are based on the exponential weights algorithm with carefully chosen transductive priors.
arXiv Detail & Related papers (2024-10-29T00:01:04Z) - Decision-Focused Learning with Directional Gradients [1.2363103948638432]
We propose a novel family of decision-aware surrogate losses, called Perturbation Gradient (PG) losses, for the predict-then-optimize framework.
Unlike the original decision loss which is typically piecewise constant and discontinuous, our new PG losses is a Lipschitz continuous, difference of concave functions.
We provide numerical evidence confirming our PG losses substantively outperform existing proposals when the underlying model is misspecified.
arXiv Detail & Related papers (2024-02-05T18:14:28Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Nonlinear Permuted Granger Causality [0.6526824510982799]
Granger causal inference is a contentious but widespread method used in fields ranging from economics to neuroscience.
To allow for out-of-sample comparison, a measure of functional connectivity is explicitly defined using permutations of the covariate set.
Performance of the permutation method is compared to penalized variable selection, naive replacement, and omission techniques via simulation.
arXiv Detail & Related papers (2023-08-11T16:44:16Z) - Nonparametric Quantile Regression: Non-Crossing Constraints and
Conformal Prediction [2.654399717608053]
We propose a nonparametric quantile regression method using deep neural networks with a rectified linear unit penalty function to avoid quantile crossing.
We establish non-asymptotic upper bounds for the excess risk of the proposed nonparametric quantile regression function estimators.
Numerical experiments including simulation studies and a real data example are conducted to demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-10-18T20:59:48Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Distribution-Free Robust Linear Regression [5.532477732693]
We study random design linear regression with no assumptions on the distribution of the covariates.
We construct a non-linear estimator achieving excess risk of order $d/n$ with the optimal sub-exponential tail.
We prove an optimal version of the classical bound for the truncated least squares estimator due to Gy"orfi, Kohler, Krzyzak, and Walk.
arXiv Detail & Related papers (2021-02-25T15:10:41Z) - Stopping Criterion Design for Recursive Bayesian Classification:
Analysis and Decision Geometry [11.399206131178104]
We propose a geometric interpretation over the state posterior progression.
We show that confidence thresholds defined over maximum of the state posteriors suffer from stiffness.
We then propose a new stopping/termination criterion with a geometrical insight to overcome the limitations.
arXiv Detail & Related papers (2020-07-30T16:21:10Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.