$\lambda$-Regularized A-Optimal Design and its Approximation by
$\lambda$-Regularized Proportional Volume Sampling
- URL: http://arxiv.org/abs/2006.11182v1
- Date: Fri, 19 Jun 2020 15:17:57 GMT
- Title: $\lambda$-Regularized A-Optimal Design and its Approximation by
$\lambda$-Regularized Proportional Volume Sampling
- Authors: Uthaipon Tantipongpipat
- Abstract summary: We study the $lambda$-regularized $A$-optimal design problem and introduce the $lambda$-regularized proportional volume sampling algorithm.
The problem is motivated from optimal design in ridge regression, where one tries to minimize the expected squared error of the ridge regression predictor from the true coefficient in the underlying linear model.
- Score: 1.256413718364189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we study the $\lambda$-regularized $A$-optimal design problem
and introduce the $\lambda$-regularized proportional volume sampling algorithm,
generalized from [Nikolov, Singh, and Tantipongpipat, 2019], for this problem
with the approximation guarantee that extends upon the previous work. In this
problem, we are given vectors $v_1,\ldots,v_n\in\mathbb{R}^d$ in $d$
dimensions, a budget $k\leq n$, and the regularizer parameter $\lambda\geq0$,
and the goal is to find a subset $S\subseteq [n]$ of size $k$ that minimizes
the trace of $\left(\sum_{i\in S}v_iv_i^\top + \lambda I_d\right)^{-1}$ where
$I_d$ is the $d\times d$ identity matrix. The problem is motivated from optimal
design in ridge regression, where one tries to minimize the expected squared
error of the ridge regression predictor from the true coefficient in the
underlying linear model. We introduce $\lambda$-regularized proportional volume
sampling and give its polynomial-time implementation to solve this problem. We
show its $(1+\frac{\epsilon}{\sqrt{1+\lambda'}})$-approximation for
$k=\Omega\left(\frac d\epsilon+\frac{\log 1/\epsilon}{\epsilon^2}\right)$ where
$\lambda'$ is proportional to $\lambda$, extending the previous bound in
[Nikolov, Singh, and Tantipongpipat, 2019] to the case $\lambda>0$ and
obtaining asymptotic optimality as $\lambda\rightarrow \infty$.
Related papers
- Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model [3.309767076331365]
We derive approximation bounds for learning single neuron models using thresholded descent.
We also study the linear regression problem, where $sigma(mathbfx) = mathbfx$.
arXiv Detail & Related papers (2024-09-05T16:59:56Z) - Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error [25.777423855881878]
We show a novel elimination-based algorithm to show one can obtain an $Oleft(Hepsilonright)$-optimal policy.
We complement our upper bound with an $widetildeOmegaleft(Hepsilonright)$-optimality lower bound, giving a complete picture of this problem.
arXiv Detail & Related papers (2024-07-18T15:58:04Z) - $\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
We consider the randomized communication complexity of the distributed $ell_p$-regression problem in the coordinator model.
For $p = 2$, i.e., least squares regression, we give the first optimal bound of $tildeTheta(sd2 + sd/epsilon)$ bits.
For $p in (1,2)$,we obtain an $tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound.
arXiv Detail & Related papers (2023-07-11T08:51:53Z) - Optimal Sketching Bounds for Sparse Linear Regression [116.30196615349226]
We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $ell_p$ norm, or from a broad class of hinge-like loss functions.
We show that for sparse $ell$vareps regression, there is an oblivious distribution over sketches with $Theta(klog(d/k)/varepsilon2)$ rows, which is tight up to a constant factor.
We also show that sketching $O(mu2 klog(mu n d/varepsilon)/var
arXiv Detail & Related papers (2023-04-05T07:24:19Z) - A spectral least-squares-type method for heavy-tailed corrupted
regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers.
We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector.
We show that this dependence on $d$ is optimal, up to logarithmic factors.
We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z) - Faster Rates of Differentially Private Stochastic Convex Optimization [7.93728520583825]
We study the case where the population risk function satisfies the Tysbakov Noise Condition (TNC) with some parameter $theta>1$.
In the second part, we focus on a special case where the population risk function is strongly convex.
arXiv Detail & Related papers (2021-07-31T22:23:39Z) - Infinite-Horizon Offline Reinforcement Learning with Linear Function
Approximation: Curse of Dimensionality and Algorithm [46.36534144138337]
In this paper, we investigate the sample complexity of policy evaluation in offline reinforcement learning.
Under the low distribution shift assumption, we show that there is an algorithm that needs at most $Oleft(maxleft fracleftVert thetapirightVert _24varepsilon4logfracddelta,frac1varepsilon2left(d+logfrac1deltaright)right right)$ samples to approximate the
arXiv Detail & Related papers (2021-03-17T18:18:57Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.