Mixing between the Cross Entropy and the Expectation Loss Terms
- URL: http://arxiv.org/abs/2109.05635v1
- Date: Sun, 12 Sep 2021 23:14:06 GMT
- Title: Mixing between the Cross Entropy and the Expectation Loss Terms
- Authors: Barak Battash, Lior Wolf, Tamir Hazan
- Abstract summary: Cross entropy loss tends to focus on hard to classify samples during training.
We show that adding to the optimization goal the expectation loss helps the network to achieve better accuracy.
Our experiments show that the new training protocol improves performance across a diverse set of classification domains.
- Score: 89.30385901335323
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The cross entropy loss is widely used due to its effectiveness and solid
theoretical grounding. However, as training progresses, the loss tends to focus
on hard to classify samples, which may prevent the network from obtaining gains
in performance. While most work in the field suggest ways to classify hard
negatives, we suggest to strategically leave hard negatives behind, in order to
focus on misclassified samples with higher probabilities. We show that adding
to the optimization goal the expectation loss, which is a better approximation
of the zero-one loss, helps the network to achieve better accuracy. We,
therefore, propose to shift between the two losses during training, focusing
more on the expectation loss gradually during the later stages of training. Our
experiments show that the new training protocol improves performance across a
diverse set of classification domains, including computer vision, natural
language processing, tabular data, and sequences. Our code and scripts are
available at supplementary.
Related papers
- Understanding and Combating Robust Overfitting via Input Loss Landscape
Analysis and Regularization [5.1024659285813785]
Adrial training is prone to overfitting, and the cause is far from clear.
We find that robust overfitting results from standard training, specifically the minimization of the clean loss.
We propose a new regularizer to smooth the loss landscape by penalizing the weighted logits variation along the adversarial direction.
arXiv Detail & Related papers (2022-12-09T16:55:30Z) - Contrastive Classification and Representation Learning with
Probabilistic Interpretation [5.979778557940212]
Cross entropy loss has served as the main objective function for classification-based tasks.
We propose a new version of the supervised contrastive training that learns jointly the parameters of the classifier and the backbone of the network.
arXiv Detail & Related papers (2022-11-07T15:57:24Z) - Positive-Negative Equal Contrastive Loss for Semantic Segmentation [8.664491798389662]
Previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context.
We propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally.
We conduct comprehensive experiments and achieve state-of-the-art performance on two benchmark datasets.
arXiv Detail & Related papers (2022-07-04T13:51:29Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Striking the Right Balance: Recall Loss for Semantic Segmentation [24.047359482606307]
Class imbalance is a fundamental problem in computer vision applications such as semantic segmentation.
We propose a hard-class mining loss by reshaping the vanilla cross entropy loss.
We show that the novel recall loss changes gradually between the standard cross entropy loss and the inverse frequency weighted loss.
arXiv Detail & Related papers (2021-06-28T18:02:03Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - Adversarially Robust Learning via Entropic Regularization [31.6158163883893]
We propose a new family of algorithms, ATENT, for training adversarially robust deep neural networks.
Our approach achieves competitive (or better) performance in terms of robust classification accuracy.
arXiv Detail & Related papers (2020-08-27T18:54:43Z) - Step-Ahead Error Feedback for Distributed Training with Compressed
Gradient [99.42912552638168]
We show that a new "gradient mismatch" problem is raised by the local error feedback in centralized distributed training.
We propose two novel techniques, 1) step ahead and 2) error averaging, with rigorous theoretical analysis.
arXiv Detail & Related papers (2020-08-13T11:21:07Z) - Overfitting in adversarially robust deep learning [86.11788847990783]
We show that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training.
We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting.
arXiv Detail & Related papers (2020-02-26T15:40:50Z) - Over-parameterized Adversarial Training: An Analysis Overcoming the
Curse of Dimensionality [74.0084803220897]
Adversarial training is a popular method to give neural nets robustness against adversarial perturbations.
We show convergence to low robust training loss for emphpolynomial width instead of exponential, under natural assumptions and with the ReLU activation.
arXiv Detail & Related papers (2020-02-16T20:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.