Coresets for Classification -- Simplified and Strengthened
- URL: http://arxiv.org/abs/2106.04254v1
- Date: Tue, 8 Jun 2021 11:24:18 GMT
- Title: Coresets for Classification -- Simplified and Strengthened
- Authors: Tung Mai and Anup B. Rao and Cameron Musco
- Abstract summary: We give relative error coresets for training linear classifiers with a broad class of loss functions.
Our construction achieves $tilde O(d cdot mu_y(X)2/epsilon2)$ points, where $mu_y(X)$ is a natural complexity measure of the data matrix $X in mathbbRn times d$ and label vector $y in -1,1n$.
- Score: 19.54307474041768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We give relative error coresets for training linear classifiers with a broad
class of loss functions, including the logistic loss and hinge loss. Our
construction achieves $(1\pm \epsilon)$ relative error with $\tilde O(d \cdot
\mu_y(X)^2/\epsilon^2)$ points, where $\mu_y(X)$ is a natural complexity
measure of the data matrix $X \in \mathbb{R}^{n \times d}$ and label vector $y
\in \{-1,1\}^n$, introduced in by Munteanu et al. 2018. Our result is based on
subsampling data points with probabilities proportional to their $\ell_1$
$Lewis$ $weights$. It significantly improves on existing theoretical bounds and
performs well in practice, outperforming uniform subsampling along with other
importance sampling methods. Our sampling distribution does not depend on the
labels, so can be used for active learning. It also does not depend on the
specific loss function, so a single coreset can be used in multiple training
scenarios.
Related papers
- Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models [37.42736399673992]
A single-index model (SIM) is a function of the form $sigma(mathbfwast cdot mathbfx)$, where $sigma: mathbbR to mathbbR$ is a known link function and $mathbfwast$ is a hidden unit vector.
We show that a proper learner attains $L2$-error of $O(mathrmOPT)+epsilon$, where $
arXiv Detail & Related papers (2024-11-08T17:10:38Z) - One-Bit Quantization and Sparsification for Multiclass Linear Classification with Strong Regularization [18.427215139020625]
We show that the best classification is achieved when $f(cdot) = |cdot|2$ and $lambda to infty$.
It is often possible to find sparse and one-bit solutions that perform almost as well as one corresponding to $f(cdot) = |cdot|_infty$ in the large $lambda$ regime.
arXiv Detail & Related papers (2024-02-16T06:39:40Z) - Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss [33.18537822803389]
We show that whenever the topologies of $L2$ and $Psi_p$ are comparable on our hypothesis class $mathscrF$, $mathscrF$ is a weakly sub-Gaussian class.
Our result holds whether the problem is realizable or not and we refer to this as a emphnear mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term.
arXiv Detail & Related papers (2024-02-08T18:57:42Z) - LegendreTron: Uprising Proper Multiclass Loss Learning [22.567234503869845]
Loss functions serve as the foundation of supervised learning and are often chosen prior to model development.
Recent works have sought to emphlearn losses and models jointly.
We present sc LegendreTron as a novel and practical method that jointly learns emphproper canonical losses and probabilities for multiclass problems.
arXiv Detail & Related papers (2023-01-27T13:10:45Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - FriendlyCore: Practical Differentially Private Aggregation [67.04951703461657]
We propose a simple and practical tool $mathsfFriendlyCore$ that takes a set of points $cal D$ from an unrestricted (pseudo) metric space as input.
When $cal D$ has effective diameter $r$, $mathsfFriendlyCore$ returns a "stable" subset $cal D_Gsubseteq cal D$ that includes all points.
$mathsfFriendlyCore$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy
arXiv Detail & Related papers (2021-10-19T17:43:50Z) - Self-training Converts Weak Learners to Strong Learners in Mixture
Models [86.7137362125503]
We show that a pseudolabeler $boldsymbolbeta_mathrmpl$ can achieve classification error at most $C_mathrmerr$.
We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbolbeta_mathrmpl$ with classification error $C_mathrmerr$ using only $O(d)$ labeled examples.
arXiv Detail & Related papers (2021-06-25T17:59:16Z) - Provable Robustness of Adversarial Training for Learning Halfspaces with
Noise [95.84614821570283]
We analyze the properties of adversarial learning adversarially robust halfspaces in the presence of label noise.
To the best of our knowledge, this is the first work to show that adversarial training prov yields classifiers in noise.
arXiv Detail & Related papers (2021-04-19T16:35:38Z) - Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
We show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
arXiv Detail & Related papers (2020-10-22T00:32:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.