Semi-supervised Active Regression
- URL: http://arxiv.org/abs/2106.06676v1
- Date: Sat, 12 Jun 2021 03:28:43 GMT
- Title: Semi-supervised Active Regression
- Authors: Fnu Devvrit, Nived Rajaraman, Pranjal Awasthi
- Abstract summary: This paper studies the use of partially labelled, potentially biased data for learning tasks.
The learner has access to a dataset $X in mathbbRd | X min_beta in mathbbRd | X beta - Y |2 end2 equation while making few additional label queries.
- Score: 21.51757844385258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Labelled data often comes at a high cost as it may require recruiting human
labelers or running costly experiments. At the same time, in many practical
scenarios, one already has access to a partially labelled, potentially biased
dataset that can help with the learning task at hand. Motivated by such
settings, we formally initiate a study of $semi-supervised$ $active$ $learning$
through the frame of linear regression. In this setting, the learner has access
to a dataset $X \in \mathbb{R}^{(n_1+n_2) \times d}$ which is composed of $n_1$
unlabelled examples that an algorithm can actively query, and $n_2$ examples
labelled a-priori. Concretely, denoting the true labels by $Y \in
\mathbb{R}^{n_1 + n_2}$, the learner's objective is to find $\widehat{\beta}
\in \mathbb{R}^d$ such that, \begin{equation}
\| X \widehat{\beta} - Y \|_2^2 \le (1 + \epsilon) \min_{\beta \in
\mathbb{R}^d} \| X \beta - Y \|_2^2 \end{equation} while making as few
additional label queries as possible. In order to bound the label queries, we
introduce an instance dependent parameter called the reduced rank, denoted by
$R_X$, and propose an efficient algorithm with query complexity
$O(R_X/\epsilon)$. This result directly implies improved upper bounds for two
important special cases: (i) active ridge regression, and (ii) active kernel
ridge regression, where the reduced-rank equates to the statistical dimension,
$sd_\lambda$ and effective dimension, $d_\lambda$ of the problem respectively,
where $\lambda \ge 0$ denotes the regularization parameter. For active ridge
regression we also prove a matching lower bound of $O(sd_\lambda / \epsilon)$
on the query complexity of any algorithm. This subsumes prior work that only
considered the unregularized case, i.e., $\lambda = 0$.
Related papers
- Distribution-Independent Regression for Generalized Linear Models with
Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise.
We present an algorithm that tackles newthis problem in its most general distribution-independent setting.
This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z) - Towards Characterizing the First-order Query Complexity of Learning
(Approximate) Nash Equilibria in Zero-sum Matrix Games [0.0]
We show that exact equilibria can be computed efficiently from $O(fracln Kepsilon)$ instead of $O(fracln Kepsilon2)$ queries.
We introduce a new technique for lower bounds, which allows us to obtain lower bounds of order $tildeOmega(frac1Kepsilon)$ for any $epsilon leq 1 / (cK4)$.
arXiv Detail & Related papers (2023-04-25T12:42:59Z) - Optimal Query Complexities for Dynamic Trace Estimation [59.032228008383484]
We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly.
We provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $epsilon$ error with $delta$ failure probability.
Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation.
arXiv Detail & Related papers (2022-09-30T04:15:44Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector.
We show that this dependence on $d$ is optimal, up to logarithmic factors.
We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z) - Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z) - Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank
Approximation [23.06440095688755]
A common technique for compressing a neural network is to compute the $k$-rank $ell$ approximation $A_k,2$ of the matrix $AinmathbbRntimes d$ that corresponds to a fully connected layer (or embedding layer)
Here, $d$ is the number of the neurons in the layer, $n$ is the number in the next one, and $A_k,2$ can be stored in $O(n+d)k)$ memory instead of $O(nd)$.
This
arXiv Detail & Related papers (2020-09-11T20:21:42Z) - Active Local Learning [22.823171570933397]
We consider active local learning: given a query point $x$, and active access to an unlabeled training set $S$, output the prediction $h(x)$ of a near-optimal $h in H$.
In particular, the number of label queries should be independent of the complexity of $H$, and the function $h$ should be well-defined, independent of $x$.
This immediately also implies an algorithm for distance estimation: estimating the value $opt(H)$ from many fewer labels than needed to actually learn a near-optimal $h in
arXiv Detail & Related papers (2020-08-31T05:39:35Z) - $Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z) - Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation [30.137884459159107]
We consider the question of learning $Q$-function in a sample efficient manner for reinforcement learning with continuous state and action spaces.
We develop a simple, iterative learning algorithm that finds $epsilon$-Schmidt $Q$-function with sample complexity of $widetildeO(frac1epsilonmax(d1), d_2)+2)$ when the optimal $Q$-function has low rank $r$ and the factor $$ is below a certain threshold.
arXiv Detail & Related papers (2020-06-11T00:55:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.