Cauchy Loss Function: Robustness Under Gaussian and Cauchy Noise
- URL: http://arxiv.org/abs/2302.07238v1
- Date: Tue, 14 Feb 2023 18:34:44 GMT
- Title: Cauchy Loss Function: Robustness Under Gaussian and Cauchy Noise
- Authors: Thamsanqa Mlotshwa and Heinrich van Deventer and Anna Sergeevna Bosman
- Abstract summary: In supervised machine learning, the choice of loss function implicitly assumes a particular noise distribution over the data.
The Cauchy loss function (CLF) assumes a Cauchy noise distribution, and is therefore potentially better suited for data with outliers.
CLF yielded results that were either comparable to or better than the results yielded by MSE, with a few notable exceptions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In supervised machine learning, the choice of loss function implicitly
assumes a particular noise distribution over the data. For example, the
frequently used mean squared error (MSE) loss assumes a Gaussian noise
distribution. The choice of loss function during training and testing affects
the performance of artificial neural networks (ANNs). It is known that MSE may
yield substandard performance in the presence of outliers. The Cauchy loss
function (CLF) assumes a Cauchy noise distribution, and is therefore
potentially better suited for data with outliers. This papers aims to determine
the extent of robustness and generalisability of the CLF as compared to MSE.
CLF and MSE are assessed on a few handcrafted regression problems, and a
real-world regression problem with artificially simulated outliers, in the
context of ANN training. CLF yielded results that were either comparable to or
better than the results yielded by MSE, with a few notable exceptions.
Related papers
- Residual-based Adaptive Huber Loss (RAHL) -- Design of an improved Huber loss for CQI prediction in 5G networks [0.7499722271664144]
We propose a novel loss function, named Residual-based Adaptive Huber Loss (RAHL)
RAHL balances robustness against outliers while preserving inlier data precision.
Results affirm the superiority of RAHL, offering a promising avenue for enhanced CQI prediction in 5G networks.
arXiv Detail & Related papers (2024-08-27T00:58:32Z) - On Sequential Loss Approximation for Continual Learning [0.0]
We introduce for continual learning Autodiff Quadratic Consolidation (AQC) and Neural Consolidation (NC)
AQC approximates the previous loss function with a quadratic function, and NC approximates the previous loss function with a neural network.
We empirically study these methods in class-incremental learning, for which regularization-based methods produce unsatisfactory results.
arXiv Detail & Related papers (2024-05-26T09:20:47Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - On the Performance of Empirical Risk Minimization with Smoothed Data [59.3428024282545]
Empirical Risk Minimization (ERM) is able to achieve sublinear error whenever a class is learnable with iid data.
We show that ERM is able to achieve sublinear error whenever a class is learnable with iid data.
arXiv Detail & Related papers (2024-02-22T21:55:41Z) - Alternate Loss Functions for Classification and Robust Regression Can Improve the Accuracy of Artificial Neural Networks [6.452225158891343]
This paper shows that training speed and final accuracy of neural networks can significantly depend on the loss function used to train neural networks.
Two new classification loss functions that significantly improve performance on a wide variety of benchmark tasks are proposed.
arXiv Detail & Related papers (2023-03-17T12:52:06Z) - Regularized ERM on random subspaces [17.927376388967144]
We consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nystrom approaches for kernel methods.
Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded.
arXiv Detail & Related papers (2022-12-04T16:12:11Z) - The Fisher-Rao Loss for Learning under Label Noise [9.238700679836855]
We study the Fisher-Rao loss function, which emerges from the Fisher-Rao distance in the statistical manifold of discrete distributions.
We derive an upper bound for the performance degradation in the presence of label noise, and analyse the learning speed of this loss.
arXiv Detail & Related papers (2022-10-28T20:50:10Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Asymmetric Loss Functions for Learning with Noisy Labels [82.50250230688388]
We propose a new class of loss functions, namely textitasymmetric loss functions, which are robust to learning with noisy labels for various types of noise.
Experimental results on benchmark datasets demonstrate that asymmetric loss functions can outperform state-of-the-art methods.
arXiv Detail & Related papers (2021-06-06T12:52:48Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Identifying and Compensating for Feature Deviation in Imbalanced Deep
Learning [59.65752299209042]
We investigate learning a ConvNet under such a scenario.
We found that a ConvNet significantly over-fits the minor classes.
We propose to incorporate class-dependent temperatures (CDT) training ConvNet.
arXiv Detail & Related papers (2020-01-06T03:52:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.