Understanding Square Loss in Training Overparametrized Neural Network
Classifiers
- URL: http://arxiv.org/abs/2112.03657v1
- Date: Tue, 7 Dec 2021 12:12:30 GMT
- Title: Understanding Square Loss in Training Overparametrized Neural Network
Classifiers
- Authors: Tianyang Hu, Jun Wang, Wenjia Wang, Zhenguo Li
- Abstract summary: We contribute to the theoretical understanding of square loss in classification by systematically investigating how it performs for overparametrized neural networks.
We consider two cases, according to whether classes are separable or not. In the general non-separable case, fast convergence rate is established for both misclassification rate and calibration error.
The resulting margin is proven to be lower bounded away from zero, providing theoretical guarantees for robustness.
- Score: 31.319145959402462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has achieved many breakthroughs in modern classification tasks.
Numerous architectures have been proposed for different data structures but
when it comes to the loss function, the cross-entropy loss is the predominant
choice. Recently, several alternative losses have seen revived interests for
deep classifiers. In particular, empirical evidence seems to promote square
loss but a theoretical justification is still lacking. In this work, we
contribute to the theoretical understanding of square loss in classification by
systematically investigating how it performs for overparametrized neural
networks in the neural tangent kernel (NTK) regime. Interesting properties
regarding the generalization error, robustness, and calibration error are
revealed. We consider two cases, according to whether classes are separable or
not. In the general non-separable case, fast convergence rate is established
for both misclassification rate and calibration error. When classes are
separable, the misclassification rate improves to be exponentially fast.
Further, the resulting margin is proven to be lower bounded away from zero,
providing theoretical guarantees for robustness. We expect our findings to hold
beyond the NTK regime and translate to practical settings. To this end, we
conduct extensive empirical studies on practical neural networks, demonstrating
the effectiveness of square loss in both synthetic low-dimensional data and
real image data. Comparing to cross-entropy, square loss has comparable
generalization error but noticeable advantages in robustness and model
calibration.
Related papers
- Large Margin Discriminative Loss for Classification [3.3975558777609915]
We introduce a novel discriminative loss function with large margin in the context of Deep Learning.
This loss boosts the discriminative power of neural nets, represented by intra-class compactness and inter-class separability.
arXiv Detail & Related papers (2024-05-28T18:10:45Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Cut your Losses with Squentropy [19.924900110707284]
We propose the "squentropy" loss, which is the sum of two terms: the cross-entropy loss and the average square loss over the incorrect classes.
We show that the squentropy loss outperforms both the pure cross entropy and rescaled square losses in terms of the classification accuracy.
arXiv Detail & Related papers (2023-02-08T09:21:13Z) - Prototype-Anchored Learning for Learning with Imperfect Annotations [83.7763875464011]
It is challenging to learn unbiased classification models from imperfectly annotated datasets.
We propose a prototype-anchored learning (PAL) method, which can be easily incorporated into various learning-based classification schemes.
We verify the effectiveness of PAL on class-imbalanced learning and noise-tolerant learning by extensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-06-23T10:25:37Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - $\sigma^2$R Loss: a Weighted Loss by Multiplicative Factors using
Sigmoidal Functions [0.9569316316728905]
We introduce a new loss function called squared reduction loss ($sigma2$R loss), which is regulated by a sigmoid function to inflate/deflate the error per instance.
Our loss has clear intuition and geometric interpretation, we demonstrate by experiments the effectiveness of our proposal.
arXiv Detail & Related papers (2020-09-18T12:34:40Z) - Vulnerability Under Adversarial Machine Learning: Bias or Variance? [77.30759061082085]
We investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network.
Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation.
We introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies.
arXiv Detail & Related papers (2020-08-01T00:58:54Z) - Evaluation of Neural Architectures Trained with Square Loss vs
Cross-Entropy in Classification Tasks [23.538629997497747]
Cross-entropy loss is widely believed to be empirically superior to the square loss for classification tasks.
We show that these neural architectures perform comparably or better when trained with the square loss.
Cross-entropy appears to have a slight edge on computer vision tasks.
arXiv Detail & Related papers (2020-06-12T17:00:49Z) - Avoiding Spurious Local Minima in Deep Quadratic Networks [0.0]
We characterize the landscape of the mean squared nonlinear error for networks with neural activation functions.
We prove that deepized neural networks with quadratic activations benefit from similar landscape properties.
arXiv Detail & Related papers (2019-12-31T22:31:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.