Related papers: Mixing between the Cross Entropy and the Expectation Loss Terms

Mixing between the Cross Entropy and the Expectation Loss Terms

URL: http://arxiv.org/abs/2109.05635v1
Date: Sun, 12 Sep 2021 23:14:06 GMT
Title: Mixing between the Cross Entropy and the Expectation Loss Terms
Authors: Barak Battash, Lior Wolf, Tamir Hazan
Abstract summary: Cross entropy loss tends to focus on hard to classify samples during training. We show that adding to the optimization goal the expectation loss helps the network to achieve better accuracy. Our experiments show that the new training protocol improves performance across a diverse set of classification domains.
Score: 89.30385901335323
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The cross entropy loss is widely used due to its effectiveness and solid theoretical grounding. However, as training progresses, the loss tends to focus on hard to classify samples, which may prevent the network from obtaining gains in performance. While most work in the field suggest ways to classify hard negatives, we suggest to strategically leave hard negatives behind, in order to focus on misclassified samples with higher probabilities. We show that adding to the optimization goal the expectation loss, which is a better approximation of the zero-one loss, helps the network to achieve better accuracy. We, therefore, propose to shift between the two losses during training, focusing more on the expectation loss gradually during the later stages of training. Our experiments show that the new training protocol improves performance across a diverse set of classification domains, including computer vision, natural language processing, tabular data, and sequences. Our code and scripts are available at supplementary.

Related papers

Understanding and Combating Robust Overfitting via Input Loss Landscape Analysis and Regularization [5.1024659285813785]
Adrial training is prone to overfitting, and the cause is far from clear. We find that robust overfitting results from standard training, specifically the minimization of the clean loss. We propose a new regularizer to smooth the loss landscape by penalizing the weighted logits variation along the adversarial direction.
arXiv Detail & Related papers (2022-12-09T16:55:30Z)
Contrastive Classification and Representation Learning with Probabilistic Interpretation [5.979778557940212]
Cross entropy loss has served as the main objective function for classification-based tasks. We propose a new version of the supervised contrastive training that learns jointly the parameters of the classifier and the backbone of the network.
arXiv Detail & Related papers (2022-11-07T15:57:24Z)
Positive-Negative Equal Contrastive Loss for Semantic Segmentation [8.664491798389662]
Previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context. We propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally. We conduct comprehensive experiments and achieve state-of-the-art performance on two benchmark datasets.
arXiv Detail & Related papers (2022-07-04T13:51:29Z)
Distributed Adversarial Training to Robustify Deep Neural Networks at Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training. We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z)
Striking the Right Balance: Recall Loss for Semantic Segmentation [24.047359482606307]
Class imbalance is a fundamental problem in computer vision applications such as semantic segmentation. We propose a hard-class mining loss by reshaping the vanilla cross entropy loss. We show that the novel recall loss changes gradually between the standard cross entropy loss and the inverse frequency weighted loss.
arXiv Detail & Related papers (2021-06-28T18:02:03Z)
Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness. We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)
Adversarially Robust Learning via Entropic Regularization [31.6158163883893]
We propose a new family of algorithms, ATENT, for training adversarially robust deep neural networks. Our approach achieves competitive (or better) performance in terms of robust classification accuracy.
arXiv Detail & Related papers (2020-08-27T18:54:43Z)
Step-Ahead Error Feedback for Distributed Training with Compressed Gradient [99.42912552638168]
We show that a new "gradient mismatch" problem is raised by the local error feedback in centralized distributed training. We propose two novel techniques, 1) step ahead and 2) error averaging, with rigorous theoretical analysis.
arXiv Detail & Related papers (2020-08-13T11:21:07Z)
Overfitting in adversarially robust deep learning [86.11788847990783]
We show that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training. We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting.
arXiv Detail & Related papers (2020-02-26T15:40:50Z)
Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality [74.0084803220897]
Adversarial training is a popular method to give neural nets robustness against adversarial perturbations. We show convergence to low robust training loss for emphpolynomial width instead of exponential, under natural assumptions and with the ReLU activation.
arXiv Detail & Related papers (2020-02-16T20:13:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.