Observable adjustments in single-index models for regularized
M-estimators
- URL: http://arxiv.org/abs/2204.06990v3
- Date: Wed, 3 Jan 2024 17:41:14 GMT
- Title: Observable adjustments in single-index models for regularized
M-estimators
- Authors: Pierre C Bellec
- Abstract summary: In regime where sample size $n$ and dimension $p$ are both increasing, the behavior of the empirical distribution of $hatbeta$ and the predicted values $Xhatbeta$ has been previously characterized.
This paper develops a different theory to describe the empirical distribution of $hatbeta$ and $Xhatbeta$.
- Score: 3.5353632767823506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider observations $(X,y)$ from single index models with unknown link
function, Gaussian covariates and a regularized M-estimator $\hat\beta$
constructed from convex loss function and regularizer. In the regime where
sample size $n$ and dimension $p$ are both increasing such that $p/n$ has a
finite limit, the behavior of the empirical distribution of $\hat\beta$ and the
predicted values $X\hat\beta$ has been previously characterized in a number of
models: The empirical distributions are known to converge to proximal operators
of the loss and penalty in a related Gaussian sequence model, which captures
the interplay between ratio $p/n$, loss, regularization and the data generating
process. This connection between$(\hat\beta,X\hat\beta)$ and the corresponding
proximal operators require solving fixed-point equations that typically involve
unobservable quantities such as the prior distribution on the index or the link
function.
This paper develops a different theory to describe the empirical distribution
of $\hat\beta$ and $X\hat\beta$: Approximations of $(\hat\beta,X\hat\beta)$ in
terms of proximal operators are provided that only involve observable
adjustments. These proposed observable adjustments are data-driven, e.g., do
not require prior knowledge of the index or the link function. These new
adjustments yield confidence intervals for individual components of the index,
as well as estimators of the correlation of $\hat\beta$ with the index. The
interplay between loss, regularization and the model is thus captured in a
data-driven manner, without solving the fixed-point equations studied in
previous works. The results apply to both strongly convex regularizers and
unregularized M-estimation. Simulations are provided for the square and
logistic loss in single index models including logistic regression and 1-bit
compressed sensing with 20\% corrupted bits.
Related papers
- Precise Asymptotics of Bagging Regularized M-estimators [5.165142221427928]
We characterize the squared prediction risk of ensemble estimators obtained through subagging (subsample bootstrap aggregating) regularized M-estimators.
Key to our analysis is a new result on the joint behavior of correlations between the estimator and residual errors on overlapping subsamples.
Joint optimization of subsample size, ensemble size, and regularization can significantly outperform regularizer optimization alone on the full data.
arXiv Detail & Related papers (2024-09-23T17:48:28Z) - Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation.
We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z) - Optimal score estimation via empirical Bayes smoothing [13.685846094715364]
We study the problem of estimating the score function of an unknown probability distribution $rho*$ from $n$ independent and identically distributed observations in $d$ dimensions.
We show that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound.
arXiv Detail & Related papers (2024-02-12T16:17:40Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Asymptotic Characterisation of Robust Empirical Risk Minimisation
Performance in the Presence of Outliers [18.455890316339595]
We study robust linear regression in high-dimension, when both the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $alpha=n/d$, and study a data model that includes outliers.
We provide exacts for the performances of the empirical risk minimisation (ERM) using $ell$-regularised $ell$, $ell_$, and Huber losses.
arXiv Detail & Related papers (2023-05-30T12:18:39Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Estimation in Tensor Ising Models [5.161531917413708]
We consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes.
In particular, we show the $sqrt N$-consistency of the MPL estimate in the $p$-spin Sherrington-Kirkpatrick (SK) model.
We derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie-Weiss model.
arXiv Detail & Related papers (2020-08-29T00:06:58Z) - The Generalized Lasso with Nonlinear Observations and Generative Priors [63.541900026673055]
We make the assumption of sub-Gaussian measurements, which is satisfied by a wide range of measurement models.
We show that our result can be extended to the uniform recovery guarantee under the assumption of a so-called local embedding property.
arXiv Detail & Related papers (2020-06-22T16:43:35Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.