Query Complexity of Least Absolute Deviation Regression via Robust
Uniform Convergence
- URL: http://arxiv.org/abs/2102.02322v1
- Date: Wed, 3 Feb 2021 22:54:27 GMT
- Title: Query Complexity of Least Absolute Deviation Regression via Robust
Uniform Convergence
- Authors: Xue Chen, Micha{\l} Derezi\'nski
- Abstract summary: We show that the query complexity of least absolute deviation regression is $Theta(d/epsilon2)$ up to logarithmic factors.
We introduce the notion of robust uniform convergence, which is a new approximation guarantee for the empirical loss.
- Score: 26.51921319084656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider a regression problem where the learner is given a large collection
of $d$-dimensional data points, but can only query a small subset of the
real-valued labels. How many queries are needed to obtain a $1+\epsilon$
relative error approximation of the optimum? While this problem has been
extensively studied for least squares regression, little is known for other
losses. An important example is least absolute deviation regression ($\ell_1$
regression) which enjoys superior robustness to outliers compared to least
squares. We develop a new framework for analyzing importance sampling methods
in regression problems, which enables us to show that the query complexity of
least absolute deviation regression is $\Theta(d/\epsilon^2)$ up to logarithmic
factors. We further extend our techniques to show the first bounds on the query
complexity for any $\ell_p$ loss with $p\in(1,2)$. As a key novelty in our
analysis, we introduce the notion of robust uniform convergence, which is a new
approximation guarantee for the empirical loss. While it is inspired by uniform
convergence in statistical learning, our approach additionally incorporates a
correction term to avoid unnecessary variance due to outliers. This can be
viewed as a new connection between statistical learning theory and variance
reduction techniques in stochastic optimization, which should be of independent
interest.
Related papers
- Data subsampling for Poisson regression with pth-root-link [53.63838219437508]
We develop and analyze data subsampling techniques for Poisson regression.
In particular, we consider the Poisson generalized linear model with ID- and square root-link functions.
arXiv Detail & Related papers (2024-10-30T10:09:05Z) - Robust Capped lp-Norm Support Vector Ordinal Regression [85.84718111830752]
Ordinal regression is a specialized supervised problem where the labels show an inherent order.
Support Vector Ordinal Regression, as an outstanding ordinal regression model, is widely used in many ordinal regression tasks.
We introduce a new model, Capped $ell_p$-Norm Support Vector Ordinal Regression(CSVOR), that is robust to outliers.
arXiv Detail & Related papers (2024-04-25T13:56:05Z) - Retire: Robust Expectile Regression in High Dimensions [3.9391041278203978]
Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data.
We propose and study (penalized) robust expectile regression (retire)
We show that the proposed procedure can be efficiently solved by a semismooth Newton coordinate descent algorithm.
arXiv Detail & Related papers (2022-12-11T18:03:12Z) - Dimension free ridge regression [10.434481202633458]
We revisit ridge regression on i.i.d. data in terms of the bias and variance of ridge regression in terms of the bias and variance of an equivalent' sequence model.
As a new application, we obtain a completely explicit and sharp characterization of ridge regression for Hilbert covariates with regularly varying spectrum.
arXiv Detail & Related papers (2022-10-16T16:01:05Z) - Hardness and Algorithms for Robust and Sparse Optimization [17.842787715567436]
We explore algorithms and limitations for sparse optimization problems such as sparse linear regression and robust linear regression.
Specifically, the sparse linear regression problem seeks a $k$-sparse vector $xinmathbbRd$ to minimize $|Ax-b|$.
The robust linear regression problem seeks a set $S$ that ignores at most $k$ rows and a vector $x$ to minimize $|(Ax-b)_S|$.
arXiv Detail & Related papers (2022-06-29T01:40:38Z) - Permuted and Unlinked Monotone Regression in $\mathbb{R}^d$: an approach
based on mixture modeling and optimal transport [4.924126492174802]
We show that the notion of cyclical monotonicity of the regression function is sufficient for identification and estimation in the permuted/unlinked regression model.
We develop a computationally efficient and easy-to-use algorithm for denoising based on the Kiefer-Wolfowitz.
arXiv Detail & Related papers (2022-01-10T18:37:59Z) - Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$.
The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z) - Outlier-robust sparse/low-rank least-squares regression and robust
matrix completion [1.0878040851637998]
We study high-dimensional least-squares regression within a subgaussian statistical learning framework with heterogeneous noise.
We also present a novel theory of trace-regression with matrix decomposition based on a new application of the product process.
arXiv Detail & Related papers (2020-12-12T07:42:47Z) - Conditional Uncorrelation and Efficient Non-approximate Subset Selection
in Sparse Regression [72.84177488527398]
We consider sparse regression from the view of correlation, and propose the formula of conditional uncorrelation.
By the proposed method, the computational complexity is reduced from $O(frac16k3+mk2+mkd)$ to $O(frac16k3+frac12mk2)$ for each candidate subset in sparse regression.
arXiv Detail & Related papers (2020-09-08T20:32:26Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.