Estimating Stochastic Linear Combination of Non-linear Regressions
Efficiently and Scalably
- URL: http://arxiv.org/abs/2010.09265v1
- Date: Mon, 19 Oct 2020 07:15:38 GMT
- Title: Estimating Stochastic Linear Combination of Non-linear Regressions
Efficiently and Scalably
- Authors: Di Wang and Xiangyu Guo and Chaowen Guan and Shi Li and Jinhui Xu
- Abstract summary: We show that when the sub-sample sizes are large then the estimation errors will be sacrificed by too much.
To the best of our knowledge, this is the first work that and guarantees for the lineartext+Stochasticity model.
- Score: 23.372021234032363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, many machine learning and statistical models such as non-linear
regressions, the Single Index, Multi-index, Varying Coefficient Index Models
and Two-layer Neural Networks can be reduced to or be seen as a special case of
a new model which is called the \textit{Stochastic Linear Combination of
Non-linear Regressions} model. However, due to the high non-convexity of the
problem, there is no previous work study how to estimate the model. In this
paper, we provide the first study on how to estimate the model efficiently and
scalably. Specifically, we first show that with some mild assumptions, if the
variate vector $x$ is multivariate Gaussian, then there is an algorithm whose
output vectors have $\ell_2$-norm estimation errors of $O(\sqrt{\frac{p}{n}})$
with high probability, where $p$ is the dimension of $x$ and $n$ is the number
of samples. The key idea of the proof is based on an observation motived by the
Stein's lemma. Then we extend our result to the case where $x$ is bounded and
sub-Gaussian using the zero-bias transformation, which could be seen as a
generalization of the classic Stein's lemma. We also show that with some
additional assumptions there is an algorithm whose output vectors have
$\ell_\infty$-norm estimation errors of
$O(\frac{1}{\sqrt{p}}+\sqrt{\frac{p}{n}})$ with high probability. We also
provide a concrete example to show that there exists some link function which
satisfies the previous assumptions. Finally, for both Gaussian and sub-Gaussian
cases we propose a faster sub-sampling based algorithm and show that when the
sub-sample sizes are large enough then the estimation errors will not be
sacrificed by too much. Experiments for both cases support our theoretical
results.
To the best of our knowledge, this is the first work that studies and
provides theoretical guarantees for the stochastic linear combination of
non-linear regressions model.
Related papers
- Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.
We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.
Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Computational-Statistical Gaps for Improper Learning in Sparse Linear Regression [4.396860522241307]
We show that an efficient learning algorithm for sparse linear regression can be used to solve sparse PCA problems with a negative spike.
We complement our reduction with low-degree and statistical query lower bounds for the sparse problems from which we reduce.
arXiv Detail & Related papers (2024-02-21T19:55:01Z) - Sparse Gaussian Graphical Models with Discrete Optimization:
Computational and Statistical Perspectives [8.403841349300103]
We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model.
We propose GraphL0BnB, a new estimator based on an $ell_0$-penalized version of the pseudolikelihood function.
Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 104$.
arXiv Detail & Related papers (2023-07-18T15:49:02Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - What Makes A Good Fisherman? Linear Regression under Self-Selection Bias [32.6588421908864]
In the classical setting of self-selection, the goal is to learn $k$ models, simultaneously from observations $(x(i), y(i))$.
In this work, we present the first computationally and statistically efficient estimation algorithms for the most standard setting of this problem where the models are linear.
arXiv Detail & Related papers (2022-05-06T14:03:05Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Max-Linear Regression by Convex Programming [5.366354612549172]
We formulate and analyze a scalable convex program given by anchored regression (AR) as the estimator for the max-linear regression problem.
Our result shows a sufficient number of noise-free observations for exact recovery scales as $k4p$ up to a logarithmic factor.
arXiv Detail & Related papers (2021-03-12T00:55:54Z) - Ising Model Selection Using $\ell_{1}$-Regularized Linear Regression [13.14903445595385]
We show that despite model misspecification, the $ell_1$-regularized linear regression ($ell_1$-LinR) estimator can successfully recover the graph structure of the Ising model with $N$ variables.
We also provide a computationally efficient method to accurately predict the non-asymptotic performance of the $ell_1$-LinR estimator with moderate $M$ and $N$.
arXiv Detail & Related papers (2021-02-08T03:45:10Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.