Related papers: Cauchy Loss Function: Robustness Under Gaussian and Cauchy Noise

Cauchy Loss Function: Robustness Under Gaussian and Cauchy Noise

URL: http://arxiv.org/abs/2302.07238v1
Date: Tue, 14 Feb 2023 18:34:44 GMT
Title: Cauchy Loss Function: Robustness Under Gaussian and Cauchy Noise
Authors: Thamsanqa Mlotshwa and Heinrich van Deventer and Anna Sergeevna Bosman
Abstract summary: In supervised machine learning, the choice of loss function implicitly assumes a particular noise distribution over the data. The Cauchy loss function (CLF) assumes a Cauchy noise distribution, and is therefore potentially better suited for data with outliers. CLF yielded results that were either comparable to or better than the results yielded by MSE, with a few notable exceptions.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In supervised machine learning, the choice of loss function implicitly assumes a particular noise distribution over the data. For example, the frequently used mean squared error (MSE) loss assumes a Gaussian noise distribution. The choice of loss function during training and testing affects the performance of artificial neural networks (ANNs). It is known that MSE may yield substandard performance in the presence of outliers. The Cauchy loss function (CLF) assumes a Cauchy noise distribution, and is therefore potentially better suited for data with outliers. This papers aims to determine the extent of robustness and generalisability of the CLF as compared to MSE. CLF and MSE are assessed on a few handcrafted regression problems, and a real-world regression problem with artificially simulated outliers, in the context of ANN training. CLF yielded results that were either comparable to or better than the results yielded by MSE, with a few notable exceptions.

Related papers

PEEL the Layers and Find Yourself: Revisiting Inference-time Data Leakage for Residual Neural Networks [64.90981115460937]
This paper explores inference-time data leakage risks of deep neural networks (NNs) We propose a novel backward feature inversion method, textbfPEEL, which can effectively recover block-wise input features from the intermediate output of residual NNs. Our results show that PEEL outperforms the state-of-the-art recovery methods by an order of magnitude when evaluated by mean squared error (MSE)
arXiv Detail & Related papers (2025-04-08T20:11:05Z)
Residual-based Adaptive Huber Loss (RAHL) -- Design of an improved Huber loss for CQI prediction in 5G networks [0.7499722271664144]
We propose a novel loss function, named Residual-based Adaptive Huber Loss (RAHL) RAHL balances robustness against outliers while preserving inlier data precision. Results affirm the superiority of RAHL, offering a promising avenue for enhanced CQI prediction in 5G networks.
arXiv Detail & Related papers (2024-08-27T00:58:32Z)
On Sequential Loss Approximation for Continual Learning [0.0]
We introduce for continual learning Autodiff Quadratic Consolidation (AQC) and Neural Consolidation (NC) AQC approximates the previous loss function with a quadratic function, and NC approximates the previous loss function with a neural network. We empirically study these methods in class-incremental learning, for which regularization-based methods produce unsatisfactory results.
arXiv Detail & Related papers (2024-05-26T09:20:47Z)
Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
On the Performance of Empirical Risk Minimization with Smoothed Data [59.3428024282545]
Empirical Risk Minimization (ERM) is able to achieve sublinear error whenever a class is learnable with iid data. We show that ERM is able to achieve sublinear error whenever a class is learnable with iid data.
arXiv Detail & Related papers (2024-02-22T21:55:41Z)
Alternate Loss Functions for Classification and Robust Regression Can Improve the Accuracy of Artificial Neural Networks [6.452225158891343]
This paper shows that training speed and final accuracy of neural networks can significantly depend on the loss function used to train neural networks. Two new classification loss functions that significantly improve performance on a wide variety of benchmark tasks are proposed.
arXiv Detail & Related papers (2023-03-17T12:52:06Z)
Regularized ERM on random subspaces [17.927376388967144]
We consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nystrom approaches for kernel methods. Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded.
arXiv Detail & Related papers (2022-12-04T16:12:11Z)
The Fisher-Rao Loss for Learning under Label Noise [9.238700679836855]
We study the Fisher-Rao loss function, which emerges from the Fisher-Rao distance in the statistical manifold of discrete distributions. We derive an upper bound for the performance degradation in the presence of label noise, and analyse the learning speed of this loss.
arXiv Detail & Related papers (2022-10-28T20:50:10Z)
Jensen-Shannon Divergence Based Novel Loss Functions for Bayesian Neural Networks [2.4554686192257424]
We formulate a novel loss function for BNNs based on a new modification to the generalized Jensen-Shannon (JS) divergence, which is bounded. We find that the JS divergence-based variational inference is intractable, and hence employed a constrained optimization framework to formulate these losses. Our theoretical analysis and empirical experiments on multiple regression and classification data sets suggest that the proposed losses perform better than the KL divergence-based loss, especially when the data sets are noisy or biased.
arXiv Detail & Related papers (2022-09-23T01:47:09Z)
The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators. In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z)
Asymmetric Loss Functions for Learning with Noisy Labels [82.50250230688388]
We propose a new class of loss functions, namely textitasymmetric loss functions, which are robust to learning with noisy labels for various types of noise. Experimental results on benchmark datasets demonstrate that asymmetric loss functions can outperform state-of-the-art methods.
arXiv Detail & Related papers (2021-06-06T12:52:48Z)
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z)
Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning [59.65752299209042]
We investigate learning a ConvNet under such a scenario. We found that a ConvNet significantly over-fits the minor classes. We propose to incorporate class-dependent temperatures (CDT) training ConvNet.
arXiv Detail & Related papers (2020-01-06T03:52:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.