The Implicit Bias of Gradient Descent on Separable Multiclass Data
        - URL: http://arxiv.org/abs/2411.01350v2
 - Date: Thu, 07 Nov 2024 03:39:06 GMT
 - Title: The Implicit Bias of Gradient Descent on Separable Multiclass Data
 - Authors: Hrithik Ravi, Clayton Scott, Daniel Soudry, Yutong Wang, 
 - Abstract summary: We employ the framework of Permutation Equivariant and Relative Margin-based (PERM) losses to introduce a multiclass extension of the exponential tail property.
Our proof techniques closely mirror those of the binary case, thus illustrating the power of the PERM framework for bridging the binary-multiclass gap.
 - Score: 38.05903703331163
 - License: http://creativecommons.org/licenses/by-nc-sa/4.0/
 - Abstract:   Implicit bias describes the phenomenon where optimization-based training algorithms, without explicit regularization, show a preference for simple estimators even when more complex estimators have equal objective values. Multiple works have developed the theory of implicit bias for binary classification under the assumption that the loss satisfies an exponential tail property. However, there is a noticeable gap in analysis for multiclass classification, with only a handful of results which themselves are restricted to the cross-entropy loss. In this work, we employ the framework of Permutation Equivariant and Relative Margin-based (PERM) losses [Wang and Scott, 2024] to introduce a multiclass extension of the exponential tail property. This class of losses includes not only cross-entropy but also other losses. Using this framework, we extend the implicit bias result of Soudry et al. [2018] to multiclass classification. Furthermore, our proof techniques closely mirror those of the binary case, thus illustrating the power of the PERM framework for bridging the binary-multiclass gap. 
 
       
      
        Related papers
        - Learning a Gaussian Mixture for Sparsity Regularization in Inverse
  Problems [2.375943263571389]
In inverse problems, the incorporation of a sparsity prior yields a regularization effect on the solution.
We propose a probabilistic sparsity prior formulated as a mixture of Gaussians, capable of modeling sparsity with respect to a generic basis.
We put forth both a supervised and an unsupervised training strategy to estimate the parameters of this network.
arXiv  Detail & Related papers  (2024-01-29T22:52:57Z) - Unified Binary and Multiclass Margin-Based Classification [27.28814893730265]
We show that a broad range of multiclass loss functions, including many popular ones, can be expressed in the relative margin form.
We then analyze the class of Fenchel-Young losses, and expand the set of these losses that are known to be classification-calibrated.
arXiv  Detail & Related papers  (2023-11-29T16:24:32Z) - Regularized Linear Regression for Binary Classification [20.710343135282116]
Regularized linear regression is a promising approach for binary classification problems in which the training set has noisy labels.
We show that for large enough regularization strength, the optimal weights concentrate around two values of opposite sign.
We observe that in many cases the corresponding "compression" of each weight to a single bit leads to very little loss in performance.
arXiv  Detail & Related papers  (2023-11-03T23:18:21Z) - Balanced Classification: A Unified Framework for Long-Tailed Object
  Detection [74.94216414011326]
Conventional detectors suffer from performance degradation when dealing with long-tailed data due to a classification bias towards the majority head categories.
We introduce a unified framework called BAlanced CLassification (BACL), which enables adaptive rectification of inequalities caused by disparities in category distribution.
BACL consistently achieves performance improvements across various datasets with different backbones and architectures.
arXiv  Detail & Related papers  (2023-08-04T09:11:07Z) - General Loss Functions Lead to (Approximate) Interpolation in High
  Dimensions [6.738946307589741]
We provide a unified framework to approximately characterize the implicit bias of gradient descent in closed form.
Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm in high dimensions.
Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings.
arXiv  Detail & Related papers  (2023-03-13T21:23:12Z) - Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators [100.58924375509659]
Straight-through (ST) estimator gained popularity due to its simplicity and efficiency.
Several techniques were proposed to improve over ST while keeping the same low computational complexity.
We conduct a theoretical analysis of Bias and Variance of these methods in order to understand tradeoffs and verify originally claimed properties.
arXiv  Detail & Related papers  (2021-10-07T15:16:07Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
  Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv  Detail & Related papers  (2021-06-07T16:53:56Z) - Shaping Deep Feature Space towards Gaussian Mixture for Visual
  Classification [74.48695037007306]
We propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification.
With a classification margin and a likelihood regularization, the GM loss facilitates both high classification performance and accurate modeling of the feature distribution.
The proposed model can be implemented easily and efficiently without using extra trainable parameters.
arXiv  Detail & Related papers  (2020-11-18T03:32:27Z) - Robustifying Binary Classification to Adversarial Perturbation [45.347651499585055]
In this paper we consider the problem of binary classification with adversarial perturbations.
We introduce a generalization to the max-margin classifier which takes into account the power of the adversary in manipulating the data.
Under some mild assumptions on the loss function, we theoretically show that the gradient descents converge to the RM classifier in its direction.
arXiv  Detail & Related papers  (2020-10-29T07:20:37Z) - Rethinking preventing class-collapsing in metric learning with
  margin-based losses [81.22825616879936]
Metric learning seeks embeddings where visually similar instances are close and dissimilar instances are apart.
margin-based losses tend to project all samples of a class onto a single point in the embedding space.
We propose a simple modification to the embedding losses such that each sample selects its nearest same-class counterpart in a batch.
arXiv  Detail & Related papers  (2020-06-09T09:59:25Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.