The Fisher-Rao Loss for Learning under Label Noise
- URL: http://arxiv.org/abs/2210.16401v1
- Date: Fri, 28 Oct 2022 20:50:10 GMT
- Title: The Fisher-Rao Loss for Learning under Label Noise
- Authors: Henrique K. Miyamoto, F\'abio C. C. Meneghetti, Sueli I. R. Costa
- Abstract summary: We study the Fisher-Rao loss function, which emerges from the Fisher-Rao distance in the statistical manifold of discrete distributions.
We derive an upper bound for the performance degradation in the presence of label noise, and analyse the learning speed of this loss.
- Score: 9.238700679836855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Choosing a suitable loss function is essential when learning by empirical
risk minimisation. In many practical cases, the datasets used for training a
classifier may contain incorrect labels, which prompts the interest for using
loss functions that are inherently robust to label noise. In this paper, we
study the Fisher-Rao loss function, which emerges from the Fisher-Rao distance
in the statistical manifold of discrete distributions. We derive an upper bound
for the performance degradation in the presence of label noise, and analyse the
learning speed of this loss. Comparing with other commonly used losses, we
argue that the Fisher-Rao loss provides a natural trade-off between robustness
and training dynamics. Numerical experiments with synthetic and MNIST datasets
illustrate this performance.
Related papers
- LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner.
We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z) - Robust Loss Functions for Training Decision Trees with Noisy Labels [4.795403008763752]
We consider training decision trees using noisily labeled data, focusing on loss functions that can lead to robust learning algorithms.
First, we offer novel theoretical insights on the robustness of many existing loss functions in the context of decision tree learning.
Second, we introduce a framework for constructing robust loss functions, called distribution losses.
arXiv Detail & Related papers (2023-12-20T11:27:46Z) - Noise-Robust Loss Functions: Enhancing Bounded Losses for Large-Scale Noisy Data Learning [0.0]
Large annotated datasets inevitably contain noisy labels, which poses a major challenge for training deep neural networks as they easily memorize the labels.
Noise-robust loss functions have emerged as a notable strategy to counteract this issue, but it remains challenging to create a robust loss function which is not susceptible to underfitting.
We propose a novel method denoted as logit bias, which adds a real number $epsilon$ to the logit at the position of the correct class.
arXiv Detail & Related papers (2023-06-08T18:38:55Z) - Robust T-Loss for Medical Image Segmentation [56.524774292536264]
This paper presents a new robust loss function, the T-Loss, for medical image segmentation.
The proposed loss is based on the negative log-likelihood of the Student-t distribution and can effectively handle outliers in the data.
Our experiments show that the T-Loss outperforms traditional loss functions in terms of dice scores on two public medical datasets.
arXiv Detail & Related papers (2023-06-01T14:49:40Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Robustness and reliability when training with noisy labels [12.688634089849023]
Labelling of data for supervised learning can be costly and time-consuming.
Deep neural networks have proved capable of fitting random labels, regularisation and the use of robust loss functions.
arXiv Detail & Related papers (2021-10-07T10:30:20Z) - On Codomain Separability and Label Inference from (Noisy) Loss Functions [11.780563744330038]
We introduce the notion of codomain separability to study the necessary and sufficient conditions under which label inference is possible from any (noisy) loss function values.
We show that for many commonly used loss functions, including multiclass cross-entropy with common activation functions and some Bregman divergence-based losses, it is possible to design label inference attacks for arbitrary noise levels.
arXiv Detail & Related papers (2021-07-07T05:29:53Z) - Sample Selection with Uncertainty of Losses for Learning with Noisy
Labels [145.06552420999986]
In learning with noisy labels, the sample selection approach is very popular, which regards small-loss data as correctly labeled during training.
However, losses are generated on-the-fly based on the model being trained with noisy labels, and thus large-loss data are likely but not certainly to be incorrect.
In this paper, we incorporate the uncertainty of losses by adopting interval estimation instead of point estimation of losses.
arXiv Detail & Related papers (2021-06-01T12:53:53Z) - Searching for Robustness: Loss Learning for Noisy Classification Tasks [81.70914107917551]
We parameterize a flexible family of loss functions using Taylors and apply evolutionary strategies to search for noise-robust losses in this space.
The resulting white-box loss provides a simple and fast "plug-and-play" module that enables effective noise-robust learning in diverse downstream tasks.
arXiv Detail & Related papers (2021-02-27T15:27:22Z) - Risk Bounds for Robust Deep Learning [1.52292571922932]
It has been observed that certain loss functions can render deep-learning pipelines robust against flaws in the data.
We especially show that empirical-risk minimization with unbounded, Lipschitz-continuous loss functions, such as the least-absolute deviation loss, Huber loss, Cauchy loss, and Tukey's biweight loss, can provide efficient prediction under minimal assumptions on the data.
arXiv Detail & Related papers (2020-09-14T05:06:59Z) - An Equivalence between Loss Functions and Non-Uniform Sampling in
Experience Replay [72.23433407017558]
We show that any loss function evaluated with non-uniformly sampled data can be transformed into another uniformly sampled loss function.
Surprisingly, we find in some environments PER can be replaced entirely by this new loss function without impact to empirical performance.
arXiv Detail & Related papers (2020-07-12T17:45:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.