Evaluating the Impact of Loss Function Variation in Deep Learning for
Classification
- URL: http://arxiv.org/abs/2210.16003v1
- Date: Fri, 28 Oct 2022 09:10:10 GMT
- Title: Evaluating the Impact of Loss Function Variation in Deep Learning for
Classification
- Authors: Simon Dr\"ager, Jannik Dunkelau
- Abstract summary: The loss function is arguably among the most important hyper parameters for a neural network.
We consider deep neural networks in a supervised classification setting and analyze the impact the choice of loss function has onto the training result.
While certain loss functions perform suboptimally, our work empirically shows that under-represented losses can outperform the State-of-the-Art choices significantly.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The loss function is arguably among the most important hyperparameters for a
neural network. Many loss functions have been designed to date, making a
correct choice nontrivial. However, elaborate justifications regarding the
choice of the loss function are not made in related work. This is, as we see
it, an indication of a dogmatic mindset in the deep learning community which
lacks empirical foundation. In this work, we consider deep neural networks in a
supervised classification setting and analyze the impact the choice of loss
function has onto the training result. While certain loss functions perform
suboptimally, our work empirically shows that under-represented losses such as
the KL Divergence can outperform the State-of-the-Art choices significantly,
highlighting the need to include the loss function as a tuned hyperparameter
rather than a fixed choice.
Related papers
- Unraveling the Hessian: A Key to Smooth Convergence in Loss Function Landscapes [0.0]
We theoretically analyze the convergence of the loss landscape in a fully connected neural network and derive upper bounds for the difference in loss function values when adding a new object to the sample.
Our empirical study confirms these results on various datasets, demonstrating the convergence of the loss function surface for image classification tasks.
arXiv Detail & Related papers (2024-09-18T14:04:15Z) - LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner.
We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Alternate Loss Functions for Classification and Robust Regression Can Improve the Accuracy of Artificial Neural Networks [6.452225158891343]
This paper shows that training speed and final accuracy of neural networks can significantly depend on the loss function used to train neural networks.
Two new classification loss functions that significantly improve performance on a wide variety of benchmark tasks are proposed.
arXiv Detail & Related papers (2023-03-17T12:52:06Z) - A survey and taxonomy of loss functions in machine learning [60.41650195728953]
Most state-of-the-art machine learning techniques revolve around the optimisation of loss functions.
This survey aims to provide a reference of the most essential loss functions for both beginner and advanced machine learning practitioners.
arXiv Detail & Related papers (2023-01-13T14:38:24Z) - Xtreme Margin: A Tunable Loss Function for Binary Classification
Problems [0.0]
We provide an overview of a novel loss function, the Xtreme Margin loss function.
Unlike the binary cross-entropy and the hinge loss functions, this loss function provides researchers and practitioners flexibility with their training process.
arXiv Detail & Related papers (2022-10-31T22:39:32Z) - Memorization in Deep Neural Networks: Does the Loss Function matter? [1.71982924656402]
We show that a symmetric loss function, as opposed to either cross-entropy or squared error loss, results in significant improvement in the ability of the network to resist such overfitting.
Our results clearly bring out the role loss functions alone can play in this phenomenon of memorization.
arXiv Detail & Related papers (2021-07-21T09:08:51Z) - A Mixed Focal Loss Function for Handling Class Imbalanced Medical Image
Segmentation [0.7619404259039283]
We propose a new compound loss function derived from modified variants of the Focal Focal loss and Dice loss functions.
Our proposed loss function is associated with a better recall-precision balance, significantly outperforming the other loss functions in both binary and multi-class image segmentation.
arXiv Detail & Related papers (2021-02-08T20:47:38Z) - Why Do Better Loss Functions Lead to Less Transferable Features? [93.47297944685114]
This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on ImageNet.
We show that many objectives lead to statistically significant improvements in ImageNet accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks.
arXiv Detail & Related papers (2020-10-30T17:50:31Z) - Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation [56.343646789922545]
We propose to automate the design of metric-specific loss functions by searching differentiable surrogate losses for each metric.
Experiments on PASCAL VOC and Cityscapes demonstrate that the searched surrogate losses outperform the manually designed loss functions consistently.
arXiv Detail & Related papers (2020-10-15T17:59:08Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.