Related papers: Semi-supervised Active Regression

Semi-supervised Active Regression

URL: http://arxiv.org/abs/2106.06676v1
Date: Sat, 12 Jun 2021 03:28:43 GMT
Title: Semi-supervised Active Regression
Authors: Fnu Devvrit, Nived Rajaraman, Pranjal Awasthi
Abstract summary: This paper studies the use of partially labelled, potentially biased data for learning tasks. The learner has access to a dataset $X in mathbbRd | X min_beta in mathbbRd | X beta - Y |2 end2 equation while making few additional label queries.
Score: 21.51757844385258
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Labelled data often comes at a high cost as it may require recruiting human labelers or running costly experiments. At the same time, in many practical scenarios, one already has access to a partially labelled, potentially biased dataset that can help with the learning task at hand. Motivated by such settings, we formally initiate a study of $semi-supervised$ $active$ $learning$ through the frame of linear regression. In this setting, the learner has access to a dataset $X \in \mathbb{R}^{(n_1+n_2) \times d}$ which is composed of $n_1$ unlabelled examples that an algorithm can actively query, and $n_2$ examples labelled a-priori. Concretely, denoting the true labels by $Y \in \mathbb{R}^{n_1 + n_2}$, the learner's objective is to find $\widehat{\beta} \in \mathbb{R}^d$ such that, \begin{equation} \| X \widehat{\beta} - Y \|_2^2 \le (1 + \epsilon) \min_{\beta \in \mathbb{R}^d} \| X \beta - Y \|_2^2 \end{equation} while making as few additional label queries as possible. In order to bound the label queries, we introduce an instance dependent parameter called the reduced rank, denoted by $R_X$, and propose an efficient algorithm with query complexity $O(R_X/\epsilon)$. This result directly implies improved upper bounds for two important special cases: (i) active ridge regression, and (ii) active kernel ridge regression, where the reduced-rank equates to the statistical dimension, $sd_\lambda$ and effective dimension, $d_\lambda$ of the problem respectively, where $\lambda \ge 0$ denotes the regularization parameter. For active ridge regression we also prove a matching lower bound of $O(sd_\lambda / \epsilon)$ on the query complexity of any algorithm. This subsumes prior work that only considered the unregularized case, i.e., $\lambda = 0$.

Related papers

Bandit Multiclass List Classification [28.483435983018616]
We study the problem of multiclass list classification with (semi-)bandit feedback, where input examples are mapped into subsets of size $m$ of a collection of $K$ possible labels. Our main result is for the $(varepsilon,delta)$-PAC variant of the problem for which we design an algorithm that returns an $varepsilon$-optimal hypothesis.
arXiv Detail & Related papers (2025-02-13T12:13:25Z)
Active Learning of General Halfspaces: Label Queries vs Membership Queries [35.41111301965335]
We show that any active learner requires label complexity of $tildeOmega(d/(log(m)epsilon))$, where $m$ is the number of unlabeled examples. We give a computationally efficient learner with query complexity of $tildeO(min1/p, 1/epsilon + dcdot polylog (1/epsilon))$ achieving error guarantee of $O(opt)+epsilon$.
arXiv Detail & Related papers (2024-12-31T15:40:48Z)
Distribution-Independent Regression for Generalized Linear Models with Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise. We present an algorithm that tackles newthis problem in its most general distribution-independent setting. This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z)
Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games [0.0]
We show that exact equilibria can be computed efficiently from $O(fracln Kepsilon)$ instead of $O(fracln Kepsilon2)$ queries. We introduce a new technique for lower bounds, which allows us to obtain lower bounds of order $tildeOmega(frac1Kepsilon)$ for any $epsilon leq 1 / (cK4)$.
arXiv Detail & Related papers (2023-04-25T12:42:59Z)
Optimal Query Complexities for Dynamic Trace Estimation [59.032228008383484]
We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly. We provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $epsilon$ error with $delta$ failure probability. Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation.
arXiv Detail & Related papers (2022-09-30T04:15:44Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector. We show that this dependence on $d$ is optimal, up to logarithmic factors. We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z)
Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$. We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z)
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation [23.06440095688755]
A common technique for compressing a neural network is to compute the $k$-rank $ell$ approximation $A_k,2$ of the matrix $AinmathbbRntimes d$ that corresponds to a fully connected layer (or embedding layer) Here, $d$ is the number of the neurons in the layer, $n$ is the number in the next one, and $A_k,2$ can be stored in $O(n+d)k)$ memory instead of $O(nd)$. This
arXiv Detail & Related papers (2020-09-11T20:21:42Z)
Active Local Learning [22.823171570933397]
We consider active local learning: given a query point $x$, and active access to an unlabeled training set $S$, output the prediction $h(x)$ of a near-optimal $h in H$. In particular, the number of label queries should be independent of the complexity of $H$, and the function $h$ should be well-defined, independent of $x$. This immediately also implies an algorithm for distance estimation: estimating the value $opt(H)$ from many fewer labels than needed to actually learn a near-optimal $h in
arXiv Detail & Related papers (2020-08-31T05:39:35Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z)
Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation [30.137884459159107]
We consider the question of learning $Q$-function in a sample efficient manner for reinforcement learning with continuous state and action spaces. We develop a simple, iterative learning algorithm that finds $epsilon$-Schmidt $Q$-function with sample complexity of $widetildeO(frac1epsilonmax(d1), d_2)+2)$ when the optimal $Q$-function has low rank $r$ and the factor $$ is below a certain threshold.
arXiv Detail & Related papers (2020-06-11T00:55:35Z)
Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems. We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.