論文の概要: Coresets for Classification -- Simplified and Strengthened
- arxiv url: http://arxiv.org/abs/2106.04254v1
- Date: Tue, 8 Jun 2021 11:24:18 GMT
- Title: Coresets for Classification -- Simplified and Strengthened
- Title(参考訳): 分類のためのコアセット -- 単純化と強化
- Authors: Tung Mai and Anup B. Rao and Cameron Musco
- Abstract要約: 損失関数の幅広いクラスを持つ線形分類器を訓練するための相対誤差コアセットを与える。
我々の構成は $tilde O(d cdot mu_y(X)2/epsilon2)$point, where $mu_y(X)$ は mathbbRn times d$ のデータ行列 $X と -1,1n$ のラベルベクトル $y の自然な複雑性測度である。
- 参考スコア(独自算出の注目度): 19.54307474041768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We give relative error coresets for training linear classifiers with a broad
class of loss functions, including the logistic loss and hinge loss. Our
construction achieves $(1\pm \epsilon)$ relative error with $\tilde O(d \cdot
\mu_y(X)^2/\epsilon^2)$ points, where $\mu_y(X)$ is a natural complexity
measure of the data matrix $X \in \mathbb{R}^{n \times d}$ and label vector $y
\in \{-1,1\}^n$, introduced in by Munteanu et al. 2018. Our result is based on
subsampling data points with probabilities proportional to their $\ell_1$
$Lewis$ $weights$. It significantly improves on existing theoretical bounds and
performs well in practice, outperforming uniform subsampling along with other
importance sampling methods. Our sampling distribution does not depend on the
labels, so can be used for active learning. It also does not depend on the
specific loss function, so a single coreset can be used in multiple training
- Abstract(参考訳): 我々は、ロジスティック損失やヒンジ損失を含む幅広い損失関数を持つ線形分類子を訓練するための相対誤差コアセットを与える。
我々の構成は$(1\pm \epsilon)$ relative error with $\tilde O(d \cdot \mu_y(X)^2/\epsilon^2)$ points, where $\mu_y(X)$ is a natural complexity measure of the data matrix $X \in \mathbb{R}^{n \times d}$ and label vector $y \in \{-1,1\}^n$, introduced by Munteanu et al。
我々の結果は、$\ell_1$$Lewis$$ $weights$に比例した確率を持つデータポイントのサブサンプリングに基づいている。
