One-Bit Quantization and Sparsification for Multiclass Linear
Classification via Regularized Regression
- URL: http://arxiv.org/abs/2402.10474v1
- Date: Fri, 16 Feb 2024 06:39:40 GMT
- Title: One-Bit Quantization and Sparsification for Multiclass Linear
Classification via Regularized Regression
- Authors: Reza Ghane, Danil Akhtiamov, Babak Hassibi
- Abstract summary: We show that the best classification is achieved when $f(cdot) = |cdot|2$ and $lambda to infty$.
It is often possible to find sparse and one-bit solutions that perform almost as well as one corresponding to $f(cdot) = |cdot|_infty$ in the large $lambda$ regime.
- Score: 20.710343135282116
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the use of linear regression for multiclass classification in the
over-parametrized regime where some of the training data is mislabeled. In such
scenarios it is necessary to add an explicit regularization term, $\lambda
f(w)$, for some convex function $f(\cdot)$, to avoid overfitting the mislabeled
data. In our analysis, we assume that the data is sampled from a Gaussian
Mixture Model with equal class sizes, and that a proportion $c$ of the training
labels is corrupted for each class. Under these assumptions, we prove that the
best classification performance is achieved when $f(\cdot) = \|\cdot\|^2_2$ and
$\lambda \to \infty$. We then proceed to analyze the classification errors for
$f(\cdot) = \|\cdot\|_1$ and $f(\cdot) = \|\cdot\|_\infty$ in the large
$\lambda$ regime and notice that it is often possible to find sparse and
one-bit solutions, respectively, that perform almost as well as the one
corresponding to $f(\cdot) = \|\cdot\|_2^2$.
Related papers
- Transformer In-Context Learning for Categorical Data [51.23121284812406]
We extend research on understanding Transformers through the lens of in-context learning with functional data by considering categorical outcomes, nonlinear underlying models, and nonlinear attention.
We present what is believed to be the first real-world demonstration of this few-shot-learning methodology, using the ImageNet dataset.
arXiv Detail & Related papers (2024-05-27T15:03:21Z) - Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss [33.18537822803389]
We show that whenever the topologies of $L2$ and $Psi_p$ are comparable on our hypothesis class $mathscrF$, $mathscrF$ is a weakly sub-Gaussian class.
Our result holds whether the problem is realizable or not and we refer to this as a emphnear mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term.
arXiv Detail & Related papers (2024-02-08T18:57:42Z) - Regularized Linear Regression for Binary Classification [20.710343135282116]
Regularized linear regression is a promising approach for binary classification problems in which the training set has noisy labels.
We show that for large enough regularization strength, the optimal weights concentrate around two values of opposite sign.
We observe that in many cases the corresponding "compression" of each weight to a single bit leads to very little loss in performance.
arXiv Detail & Related papers (2023-11-03T23:18:21Z) - Distribution-Independent Regression for Generalized Linear Models with
Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise.
We present an algorithm that tackles newthis problem in its most general distribution-independent setting.
This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z) - Repeated Observations for Classification [0.2676349883103404]
We study the problem nonparametric classification with repeated observations.
In the analysis, we investigate particular models like robust detection by nominal densities, prototype classification, linear transformation, linear classification, scaling.
arXiv Detail & Related papers (2023-07-19T10:50:36Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - Bias Mimicking: A Simple Sampling Approach for Bias Mitigation [57.17709477668213]
We introduce a new class-conditioned sampling method: Bias Mimicking.
Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
arXiv Detail & Related papers (2022-09-30T17:33:00Z) - GRASP: A Goodness-of-Fit Test for Classification Learning [8.122270502556374]
Despite being a standard measure, average accuracy fails in characterizing the fit of the model to the underlying conditional law of labels given the features vector ($Y|X$)
Our framework does not make any parametric assumption on the conditional law $Y|X$, and treats that as a black box oracle model which can be accessed only through queries.
arXiv Detail & Related papers (2022-09-05T17:18:43Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Self-training Converts Weak Learners to Strong Learners in Mixture
Models [86.7137362125503]
We show that a pseudolabeler $boldsymbolbeta_mathrmpl$ can achieve classification error at most $C_mathrmerr$.
We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbolbeta_mathrmpl$ with classification error $C_mathrmerr$ using only $O(d)$ labeled examples.
arXiv Detail & Related papers (2021-06-25T17:59:16Z) - The generalization error of max-margin linear classifiers: Benign
overfitting and high dimensional asymptotics in the overparametrized regime [11.252856459394854]
Modern machine learning classifiers often exhibit vanishing classification error on the training set.
Motivated by these phenomena, we revisit high-dimensional maximum margin classification for linearly separable data.
arXiv Detail & Related papers (2019-11-05T00:15:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.