Using Focal Loss to Fight Shallow Heuristics: An Empirical Analysis of
Modulated Cross-Entropy in Natural Language Inference
- URL: http://arxiv.org/abs/2211.13331v1
- Date: Wed, 23 Nov 2022 22:19:00 GMT
- Title: Using Focal Loss to Fight Shallow Heuristics: An Empirical Analysis of
Modulated Cross-Entropy in Natural Language Inference
- Authors: Frano Raji\v{c}, Ivan Stresec, Axel Marmet, Tim Po\v{s}tuvan
- Abstract summary: In some datasets, deep neural networks discover underlyings that allow them to take shortcuts in the learning process, resulting in poor generalization capability.
Instead of using standard cross-entropy, we explore whether a modulated version of cross-entropy called focal loss can constrain the model so as not to use underlyings and improve generalization performance.
Our experiments in natural language inference show that focal loss has a regularizing impact on the learning process, increasing accuracy on out-of-distribution data, but slightly decreasing performance on in-distribution data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is no such thing as a perfect dataset. In some datasets, deep neural
networks discover underlying heuristics that allow them to take shortcuts in
the learning process, resulting in poor generalization capability. Instead of
using standard cross-entropy, we explore whether a modulated version of
cross-entropy called focal loss can constrain the model so as not to use
heuristics and improve generalization performance. Our experiments in natural
language inference show that focal loss has a regularizing impact on the
learning process, increasing accuracy on out-of-distribution data, but slightly
decreasing performance on in-distribution data. Despite the improved
out-of-distribution performance, we demonstrate the shortcomings of focal loss
and its inferiority in comparison to the performance of methods such as
unbiased focal loss and self-debiasing ensembles.
Related papers
- Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches [4.577842191730992]
We study ways toward robust OoD generalization for deep learning.
We first propose a novel and effective approach to disentangle the spurious correlation between features that are not essential for recognition.
We then study the problem of strengthening neural architecture search in OoD scenarios.
arXiv Detail & Related papers (2024-10-25T20:50:32Z) - Learning Latent Graph Structures and their Uncertainty [63.95971478893842]
Graph Neural Networks (GNNs) use relational information as an inductive bias to enhance the model's accuracy.
As task-relevant relations might be unknown, graph structure learning approaches have been proposed to learn them while solving the downstream prediction task.
arXiv Detail & Related papers (2024-05-30T10:49:22Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Inconsistency, Instability, and Generalization Gap of Deep Neural
Network Training [14.871738070617491]
We show that inconsistency is a more reliable indicator of generalization gap than the sharpness of the loss landscape.
The results also provide a theoretical basis for existing methods such as co-distillation and ensemble.
arXiv Detail & Related papers (2023-05-31T20:28:13Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - Linear Regression with Distributed Learning: A Generalization Error
Perspective [0.0]
We investigate the performance of distributed learning for large-scale linear regression.
We focus on the generalization error, i.e., the performance on unseen data.
Our results show that the generalization error of the distributed solution can be substantially higher than that of the centralized solution.
arXiv Detail & Related papers (2021-01-22T08:43:28Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.