Understanding Implicit Regularization in Over-Parameterized Single Index
Model
- URL: http://arxiv.org/abs/2007.08322v3
- Date: Mon, 15 Nov 2021 14:56:48 GMT
- Title: Understanding Implicit Regularization in Over-Parameterized Single Index
Model
- Authors: Jianqing Fan, Zhuoran Yang, Mengxin Yu
- Abstract summary: We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
- Score: 55.41685740015095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we leverage over-parameterization to design
regularization-free algorithms for the high-dimensional single index model and
provide theoretical guarantees for the induced implicit regularization
phenomenon. Specifically, we study both vector and matrix single index models
where the link function is nonlinear and unknown, the signal parameter is
either a sparse vector or a low-rank symmetric matrix, and the response
variable can be heavy-tailed. To gain a better understanding of the role played
by implicit regularization without excess technicality, we assume that the
distribution of the covariates is known a priori. For both the vector and
matrix settings, we construct an over-parameterized least-squares loss function
by employing the score function transform and a robust truncation step designed
specifically for heavy-tailed data. We propose to estimate the true parameter
by applying regularization-free gradient descent to the loss function. When the
initialization is close to the origin and the stepsize is sufficiently small,
we prove that the obtained solution achieves minimax optimal statistical rates
of convergence in both the vector and matrix cases. In addition, our
experimental results support our theoretical findings and also demonstrate that
our methods empirically outperform classical methods with explicit
regularization in terms of both $\ell_2$-statistical rate and variable
selection consistency.
Related papers
- Convex Parameter Estimation of Perturbed Multivariate Generalized
Gaussian Distributions [18.95928707619676]
We propose a convex formulation with well-established properties for MGGD parameters.
The proposed framework is flexible as it combines a variety of regularizations for the precision matrix, the mean and perturbations.
Experiments show a more accurate precision and covariance matrix estimation with similar performance for the mean vector parameter.
arXiv Detail & Related papers (2023-12-12T18:08:04Z) - Gradient-based bilevel optimization for multi-penalty Ridge regression
through matrix differential calculus [0.46040036610482665]
We introduce a gradient-based approach to the problem of linear regression with l2-regularization.
We show that our approach outperforms LASSO, Ridge, and Elastic Net regression.
The analytical of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation.
arXiv Detail & Related papers (2023-11-23T20:03:51Z) - Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing [28.91482208876914]
We consider the problem of parameter estimation in a high-dimensional generalized linear model.
Despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructured designs.
arXiv Detail & Related papers (2023-08-28T11:49:23Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Implicit Balancing and Regularization: Generalization and Convergence
Guarantees for Overparameterized Asymmetric Matrix Sensing [28.77440901439686]
A series of recent papers have begun to generalize this role for non-random Positive Semi-Defin (PSD) matrix sensing problems.
In this paper, we show that the trajectory of the gradient descent from small random measurements moves towards solutions that are both globally well.
arXiv Detail & Related papers (2023-03-24T19:05:52Z) - High-dimensional analysis of double descent for linear regression with
random projections [0.0]
We consider linear regression problems with a varying number of random projections, where we provably exhibit a double descent curve for a fixed prediction problem.
We first consider the ridge regression estimator and re-interpret earlier results using classical notions from non-parametric statistics.
We then compute equivalents of the generalization performance (in terms of bias and variance) of the minimum norm least-squares fit with random projections, providing simple expressions for the double descent phenomenon.
arXiv Detail & Related papers (2023-03-02T15:58:09Z) - Weight Vector Tuning and Asymptotic Analysis of Binary Linear
Classifiers [82.5915112474988]
This paper proposes weight vector tuning of a generic binary linear classifier through the parameterization of a decomposition of the discriminant by a scalar.
It is also found that weight vector tuning significantly improves the performance of Linear Discriminant Analysis (LDA) under high estimation noise.
arXiv Detail & Related papers (2021-10-01T17:50:46Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.