Risk Bounds for Robust Deep Learning
- URL: http://arxiv.org/abs/2009.06202v1
- Date: Mon, 14 Sep 2020 05:06:59 GMT
- Title: Risk Bounds for Robust Deep Learning
- Authors: Johannes Lederer
- Abstract summary: It has been observed that certain loss functions can render deep-learning pipelines robust against flaws in the data.
We especially show that empirical-risk minimization with unbounded, Lipschitz-continuous loss functions, such as the least-absolute deviation loss, Huber loss, Cauchy loss, and Tukey's biweight loss, can provide efficient prediction under minimal assumptions on the data.
- Score: 1.52292571922932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It has been observed that certain loss functions can render deep-learning
pipelines robust against flaws in the data. In this paper, we support these
empirical findings with statistical theory. We especially show that
empirical-risk minimization with unbounded, Lipschitz-continuous loss
functions, such as the least-absolute deviation loss, Huber loss, Cauchy loss,
and Tukey's biweight loss, can provide efficient prediction under minimal
assumptions on the data. More generally speaking, our paper provides
theoretical evidence for the benefits of robust loss functions in deep
learning.
Related papers
- Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head [38.898038672237746]
We introduce a logit-level loss function as a supplement to the widely used probability-level loss function.
We find that the amalgamation of the newly introduced logit-level loss and the previous probability-level loss will lead to performance degeneration.
We propose a novel method called dual-head knowledge distillation, which partitions the linear classifier into two classification heads responsible for different losses.
arXiv Detail & Related papers (2024-11-13T12:33:04Z) - EnsLoss: Stochastic Calibrated Loss Ensembles for Preventing Overfitting in Classification [1.3778851745408134]
We propose a novel ensemble method, namely EnsLoss, to combine loss functions within the Empirical risk minimization framework.
We first transform the CC conditions of losses into loss-derivatives, thereby bypassing the need for explicit loss functions.
We theoretically establish the statistical consistency of our approach and provide insights into its benefits.
arXiv Detail & Related papers (2024-09-02T02:40:42Z) - LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner.
We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z) - Robust Loss Functions for Training Decision Trees with Noisy Labels [4.795403008763752]
We consider training decision trees using noisily labeled data, focusing on loss functions that can lead to robust learning algorithms.
First, we offer novel theoretical insights on the robustness of many existing loss functions in the context of decision tree learning.
Second, we introduce a framework for constructing robust loss functions, called distribution losses.
arXiv Detail & Related papers (2023-12-20T11:27:46Z) - A Generalized Unbiased Risk Estimator for Learning with Augmented
Classes [70.20752731393938]
Given unlabeled data, an unbiased risk estimator (URE) can be derived, which can be minimized for LAC with theoretical guarantees.
We propose a generalized URE that can be equipped with arbitrary loss functions while maintaining the theoretical guarantees.
arXiv Detail & Related papers (2023-06-12T06:52:04Z) - Expressive Losses for Verified Robustness via Convex Combinations [67.54357965665676]
We study the relationship between the over-approximation coefficient and performance profiles across different expressive losses.
We show that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.
arXiv Detail & Related papers (2023-05-23T12:20:29Z) - Cross-Entropy Loss Functions: Theoretical Analysis and Applications [27.3569897539488]
We present a theoretical analysis of a broad family of loss functions, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions.
We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds.
This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss.
arXiv Detail & Related papers (2023-04-14T17:58:23Z) - The Fisher-Rao Loss for Learning under Label Noise [9.238700679836855]
We study the Fisher-Rao loss function, which emerges from the Fisher-Rao distance in the statistical manifold of discrete distributions.
We derive an upper bound for the performance degradation in the presence of label noise, and analyse the learning speed of this loss.
arXiv Detail & Related papers (2022-10-28T20:50:10Z) - Leveraged Weighted Loss for Partial Label Learning [64.85763991485652]
Partial label learning deals with data where each instance is assigned with a set of candidate labels, whereas only one of them is true.
Despite many methodology studies on learning from partial labels, there still lacks theoretical understandings of their risk consistent properties.
We propose a family of loss functions named textitd weighted (LW) loss, which for the first time introduces the leverage parameter $beta$ to consider the trade-off between losses on partial labels and non-partial ones.
arXiv Detail & Related papers (2021-06-10T13:25:13Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Lower-bounded proper losses for weakly supervised classification [73.974163801142]
We discuss the problem of weakly supervised learning of classification, in which instances are given weak labels.
We derive a representation theorem for proper losses in supervised learning, which dualizes the Savage representation.
We experimentally demonstrate the effectiveness of our proposed approach, as compared to improper or unbounded losses.
arXiv Detail & Related papers (2021-03-04T08:47:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.