Does generalization performance of $l^q$ regularization learning depend
on $q$? A negative example
- URL: http://arxiv.org/abs/1307.6616v2
- Date: Tue, 13 Jun 2023 14:21:16 GMT
- Title: Does generalization performance of $l^q$ regularization learning depend
on $q$? A negative example
- Authors: Shaobo Lin, Chen Xu, Jingshan Zeng, Jian Fang
- Abstract summary: $lq$-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling.
We show that all $lq$ estimators for $0 infty$ attain similar generalization error bounds.
This finding tentatively reveals that, in some modeling contexts, the choice of $q$ might not have a strong impact in terms of the generalization capability.
- Score: 19.945160684285003
- License: http://creativecommons.org/licenses/by/3.0/
- Abstract: $l^q$-regularization has been demonstrated to be an attractive technique in
machine learning and statistical modeling. It attempts to improve the
generalization (prediction) capability of a machine (model) through
appropriately shrinking its coefficients. The shape of a $l^q$ estimator
differs in varying choices of the regularization order $q$. In particular,
$l^1$ leads to the LASSO estimate, while $l^{2}$ corresponds to the smooth
ridge regression. This makes the order $q$ a potential tuning parameter in
applications. To facilitate the use of $l^{q}$-regularization, we intend to
seek for a modeling strategy where an elaborative selection on $q$ is
avoidable. In this spirit, we place our investigation within a general
framework of $l^{q}$-regularized kernel learning under a sample dependent
hypothesis space (SDHS). For a designated class of kernel functions, we show
that all $l^{q}$ estimators for $0< q < \infty$ attain similar generalization
error bounds. These estimated bounds are almost optimal in the sense that up to
a logarithmic factor, the upper and lower bounds are asymptotically identical.
This finding tentatively reveals that, in some modeling contexts, the choice of
$q$ might not have a strong impact in terms of the generalization capability.
From this perspective, $q$ can be arbitrarily specified, or specified merely by
other no generalization criteria like smoothness, computational complexity,
sparsity, etc..
Related papers
- Asymptotics of Random Feature Regression Beyond the Linear Scaling
Regime [22.666759017118796]
Recent advances in machine learning have been achieved by using overparametrized models trained until near the training data.
How does model complexity and generalization depend on the number of parameters $p$?
In particular, RFRR exhibits an intuitive trade-off between approximation and generalization power.
arXiv Detail & Related papers (2024-03-13T00:59:25Z) - Agnostically Learning Multi-index Models with Queries [54.290489524576756]
We study the power of query access for the task of agnostic learning under the Gaussian distribution.
We show that query access gives significant runtime improvements over random examples for agnostically learning MIMs.
arXiv Detail & Related papers (2023-12-27T15:50:47Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Lockout: Sparse Regularization of Neural Networks [0.0]
Regularization is applied to improve accuracy by placing a constraint $P(w)leq t$ on the values of the parameters $w$.
We present a fast algorithm that provides all such solutions for any differentiable function $f$ and loss $L$, and any constraint $P$ that is an increasing monotone function of the absolute value of each parameter.
arXiv Detail & Related papers (2021-07-15T07:17:20Z) - Learning the optimal regularizer for inverse problems [1.763934678295407]
We consider the linear inverse problem $y=Ax+epsilon$, where $Acolon Xto Y$ is a known linear operator between the separable Hilbert spaces $X$ and $Y$.
This setting covers several inverse problems in imaging including denoising, deblurring, and X-ray tomography.
Within the classical framework of regularization, we focus on the case where the regularization functional is not given a priori but learned from data.
arXiv Detail & Related papers (2021-06-11T17:14:27Z) - Agnostic learning with unknown utilities [70.14742836006042]
In many real-world problems, the utility of a decision depends on the underlying context $x$ and decision $y$.
We study this as agnostic learning with unknown utilities.
We show that estimating the utilities of only the sampled points$S$ suffices to learn a decision function which generalizes well.
arXiv Detail & Related papers (2021-04-17T08:22:04Z) - Estimating Stochastic Linear Combination of Non-linear Regressions
Efficiently and Scalably [23.372021234032363]
We show that when the sub-sample sizes are large then the estimation errors will be sacrificed by too much.
To the best of our knowledge, this is the first work that and guarantees for the lineartext+Stochasticity model.
arXiv Detail & Related papers (2020-10-19T07:15:38Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation [30.137884459159107]
We consider the question of learning $Q$-function in a sample efficient manner for reinforcement learning with continuous state and action spaces.
We develop a simple, iterative learning algorithm that finds $epsilon$-Schmidt $Q$-function with sample complexity of $widetildeO(frac1epsilonmax(d1), d_2)+2)$ when the optimal $Q$-function has low rank $r$ and the factor $$ is below a certain threshold.
arXiv Detail & Related papers (2020-06-11T00:55:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.