Complexity of Vector-valued Prediction: From Linear Models to Stochastic Convex Optimization
- URL: http://arxiv.org/abs/2412.04274v1
- Date: Thu, 05 Dec 2024 15:56:54 GMT
- Title: Complexity of Vector-valued Prediction: From Linear Models to Stochastic Convex Optimization
- Authors: Matan Schliserman, Tomer Koren,
- Abstract summary: We focus on the fundamental case with a convex and Lipschitz loss function.
We show several new theoretical results that shed light on the complexity of this problem and its connection to related learning models.
Results portray the setting of vector-valued linear prediction as bridging between two extensively studied yet disparate learning models.
- Score: 27.33243506775655
- License:
- Abstract: We study the problem of learning vector-valued linear predictors: these are prediction rules parameterized by a matrix that maps an $m$-dimensional feature vector to a $k$-dimensional target. We focus on the fundamental case with a convex and Lipschitz loss function, and show several new theoretical results that shed light on the complexity of this problem and its connection to related learning models. First, we give a tight characterization of the sample complexity of Empirical Risk Minimization (ERM) in this setting, establishing that $\smash{\widetilde{\Omega}}(k/\epsilon^2)$ examples are necessary for ERM to reach $\epsilon$ excess (population) risk; this provides for an exponential improvement over recent results by Magen and Shamir (2023) in terms of the dependence on the target dimension $k$, and matches a classical upper bound due to Maurer (2016). Second, we present a black-box conversion from general $d$-dimensional Stochastic Convex Optimization (SCO) to vector-valued linear prediction, showing that any SCO problem can be embedded as a prediction problem with $k=\Theta(d)$ outputs. These results portray the setting of vector-valued linear prediction as bridging between two extensively studied yet disparate learning models: linear models (corresponds to $k=1$) and general $d$-dimensional SCO (with $k=\Theta(d)$).
Related papers
- Bivariate Matrix-valued Linear Regression (BMLR): Finite-sample performance under Identifiability and Sparsity Assumptions [0.0]
We study the estimation of parameters in a matrix-valued linear regression model, where the $T$ responses $(Y_t)_t=1T in mathbbRn times p$ and predictors $(X_t)_t=1T in mathbbRm times q$.
We propose explicit optimization-free estimators and establish non-asymptotic convergence rates to quantify their performance.
arXiv Detail & Related papers (2024-12-23T18:03:34Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal [70.15267479220691]
We consider and analyze the sample complexity of model reinforcement learning with a generative variance-free model.
Our analysis shows that it is nearly minimax-optimal for finding an $varepsilon$-optimal policy when $varepsilon$ is sufficiently small.
arXiv Detail & Related papers (2022-05-27T19:39:24Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - A Precise Performance Analysis of Support Vector Regression [105.94855998235232]
We study the hard and soft support vector regression techniques applied to a set of $n$ linear measurements.
Our results are then used to optimally tune the parameters intervening in the design of hard and soft support vector regression algorithms.
arXiv Detail & Related papers (2021-05-21T14:26:28Z) - Estimating Stochastic Linear Combination of Non-linear Regressions
Efficiently and Scalably [23.372021234032363]
We show that when the sub-sample sizes are large then the estimation errors will be sacrificed by too much.
To the best of our knowledge, this is the first work that and guarantees for the lineartext+Stochasticity model.
arXiv Detail & Related papers (2020-10-19T07:15:38Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Asymptotic errors for convex penalized linear regression beyond Gaussian
matrices [23.15629681360836]
We consider the problem of learning a coefficient vector $x_0$ in $RN$ from noisy linear observations.
We provide a rigorous derivation of an explicit formula for the mean squared error.
We show that our predictions agree remarkably well with numerics even for very moderate sizes.
arXiv Detail & Related papers (2020-02-11T13:43:32Z) - Mixture model for designs in high dimensional regression and the LASSO [0.0]
The LASSO is a technique for variable selection in the regression model bean y & = & Xbeta + z, eean.
This paper proposes a mixture model for the design matrix which is able to capture in a natural way the potentially clustered nature of the columns.
arXiv Detail & Related papers (2012-10-17T15:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.