Variable Selection in Convex Piecewise Linear Regression
- URL: http://arxiv.org/abs/2411.02225v1
- Date: Mon, 04 Nov 2024 16:19:09 GMT
- Title: Variable Selection in Convex Piecewise Linear Regression
- Authors: Haitham Kanj, Seonho Kim, Kiryung Lee,
- Abstract summary: This paper presents Sparse Gradient as a solution for variable selection in convex piecewise linear regression.
A non-asymptotic local convergence analysis is provided for SpGD under subGaussian noise.
- Score: 5.366354612549172
- License:
- Abstract: This paper presents Sparse Gradient Descent as a solution for variable selection in convex piecewise linear regression where the model is given as $\mathrm{max}\langle a_j^\star, x \rangle + b_j^\star$ for $j = 1,\dots,k$ where $x \in \mathbb R^d$ is the covariate vector. Here, $\{a_j^\star\}_{j=1}^k$ and $\{b_j^\star\}_{j=1}^k$ denote the ground-truth weight vectors and intercepts. A non-asymptotic local convergence analysis is provided for Sp-GD under sub-Gaussian noise when the covariate distribution satisfies sub-Gaussianity and anti-concentration property. When the model order and parameters are fixed, Sp-GD provides an $\epsilon$-accurate estimate given $\mathcal{O}(\max(\epsilon^{-2}\sigma_z^2,1)s\log(d/s))$ observations where $\sigma_z^2$ denotes the noise variance. This also implies the exact parameter recovery by Sp-GD from $\mathcal{O}(s\log(d/s))$ noise-free observations. Since optimizing the squared loss for sparse max-affine is non-convex, an initialization scheme is proposed to provide a suitable initial estimate within the basin of attraction for Sp-GD, i.e. sufficiently accurate to invoke the convergence guarantees. The initialization scheme uses sparse principal component analysis to estimate the subspace spanned by $\{ a_j^\star\}_{j=1}^k$ then applies an $r$-covering search to estimate the model parameters. A non-asymptotic analysis is presented for this initialization scheme when the covariates and noise samples follow Gaussian distributions. When the model order and parameters are fixed, this initialization scheme provides an $\epsilon$-accurate estimate given $\mathcal{O}(\epsilon^{-2}\max(\sigma_z^4,\sigma_z^2,1)s^2\log^4(d))$ observations. Numerical Monte Carlo results corroborate theoretical findings for Sp-GD and the initialization scheme.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions [11.222970035173372]
kernel-based score estimator achieves an optimal mean square error of $widetildeOleft(n-1 t-fracd+22(tfracd2 vee 1)right)
We show that a kernel-based score estimator achieves an optimal mean square error of $widetildeOleft(n-1/2 t-fracd4right)$ upper bound for the total variation error of the distribution of the sample generated by the diffusion model under a mere sub-Gaussian
arXiv Detail & Related papers (2024-02-23T20:51:31Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - (Nearly) Optimal Private Linear Regression via Adaptive Clipping [22.639650869444395]
We study the problem of differentially private linear regression where each data point is sampled from a fixed sub-Gaussian style distribution.
We propose and analyze a one-pass mini-batch gradient descent method (DP-AMBSSGD) where points in each iteration are sampled without replacement.
arXiv Detail & Related papers (2022-07-11T08:04:46Z) - Structure Learning in Graphical Models from Indirect Observations [17.521712510832558]
This paper considers learning of the graphical structure of a $p$-dimensional random vector $X in Rp$ using both parametric and non-parametric methods.
Under mild conditions, we show that our graph-structure estimator can obtain the correct structure.
arXiv Detail & Related papers (2022-05-06T19:24:44Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence [8.873449722727026]
We show that the rate estimate $widetildemathcalO(depsilon-1)$ improves the previously known rates in both of these metrics.
In particular, for convex and firstorder smooth potentials, we show that LMC algorithm achieves the rate estimate $widetildemathcalO(depsilon-1)$ which improves the previously known rates in both of these metrics.
arXiv Detail & Related papers (2020-07-22T18:18:28Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.