Related papers: Being Properly Improper

Being Properly Improper

URL: http://arxiv.org/abs/2106.09920v1
Date: Fri, 18 Jun 2021 05:00:15 GMT
Title: Being Properly Improper
Authors: Richard Nock, Tyler Sypherd, Lalitha Sankar
Abstract summary: We analyse class probability-based losses when they are stripped off the mandatory properness. We show that a natural extension of a half-century old loss introduced by S. Arimoto is twist proper. We then turn to a theory that has provided some of the best off-the-shelf algorithms for proper losses, boosting.
Score: 36.52509571098292
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In today's ML, data can be twisted (changed) in various ways, either for bad or good intent. Such twisted data challenges the founding theory of properness for supervised losses which form the basis for many popular losses for class probability estimation. Unfortunately, at its core, properness ensures that the optimal models also learn the twist. In this paper, we analyse such class probability-based losses when they are stripped off the mandatory properness; we define twist-proper losses as losses formally able to retrieve the optimum (untwisted) estimate off the twists, and show that a natural extension of a half-century old loss introduced by S. Arimoto is twist proper. We then turn to a theory that has provided some of the best off-the-shelf algorithms for proper losses, boosting. Boosting can require access to the derivative of the convex conjugate of a loss to compute examples weights. Such a function can be hard to get, for computational or mathematical reasons; this turns out to be the case for Arimoto's loss. We bypass this difficulty by inverting the problem as follows: suppose a blueprint boosting algorithm is implemented with a general weight update function. What are the losses for which boosting-compliant minimisation happens? Our answer comes as a general boosting algorithm which meets the optimal boosting dependence on the number of calls to the weak learner; when applied to Arimoto's loss, it leads to a simple optimisation algorithm whose performances are showcased on several domains and twists.

Related papers

Outlier-Robust Training of Machine Learning Models [21.352210662488112]
We propose an Adaptive Alternation Algorithm for training machine learning models with outliers. The algorithm iteratively trains the model by using a weighted version of the non-robust loss, while updating the weights at each. Considering arbitrary outliers (i.e., with no distributional assumption on the outliers), we show that the use of robust loss kernels sigma increases the region of convergence.
arXiv Detail & Related papers (2024-12-31T04:19:53Z)
Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms [80.37846867546517]
We show how to train eight different neural networks with custom objectives. We exploit their second-order information via their empirical Fisherssian matrices. We apply Loss Lossiable algorithms to achieve significant improvements for less differentiable algorithms.
arXiv Detail & Related papers (2024-10-24T18:02:11Z)
LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner. We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z)
How to Boost Any Loss Function [63.573324901948716]
We show that any loss function can be optimized with boosting. We also show that boosting can achieve a feat not yet known to be possible in the classical $0th$ order setting.
arXiv Detail & Related papers (2024-07-02T14:08:23Z)
What killed the Convex Booster ? [70.04715330065275]
A landmark negative result of Long and Servedio established a worst-case spectacular failure of a supervised learning trio. We argue that the source of the negative result lies in the dark side of a pervasive -- and otherwise prized -- aspect of ML: textit parameterisation.
arXiv Detail & Related papers (2022-05-19T15:42:20Z)
Stochastic smoothing of the top-K calibrated hinge loss for deep imbalanced classification [8.189630642296416]
We introduce a top-K hinge loss inspired by recent developments on top-K losses. Our proposal is based on the smoothing of the top-K operator building on the flexible "perturbed" framework. We show that our loss function performs very well in the case of balanced datasets, while benefiting from a significantly lower computational time.
arXiv Detail & Related papers (2022-02-04T15:39:32Z)
Do We Need to Penalize Variance of Losses for Learning with Label Noise? [91.38888889609002]
We find that the variance should be increased for the problem of learning with noisy labels. By exploiting the label noise transition matrix, regularizers can be easily designed to reduce the variance of losses. Empirically, the proposed method by increasing the variance of losses significantly improves the generalization ability of baselines on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-01-30T06:19:08Z)
Omnipredictors [19.735769148626588]
Loss minimization is a dominant paradigm in machine learning. We introduce the notion of an ($mathcalL,mathcalC$)-omnipredictor, which could be used to optimize any loss in a family. We show that such "loss-oblivious'' learning is feasible through a connection to multicalibration.
arXiv Detail & Related papers (2021-09-11T23:28:49Z)
All your loss are belong to Bayes [28.393499629583786]
Loss functions are a cornerstone of machine learning and the starting point of most algorithms. We introduce a trick on squared Gaussian Processes to obtain a random process whose paths are compliant source functions. Experimental results demonstrate substantial improvements over the state of the art.
arXiv Detail & Related papers (2020-06-08T14:31:21Z)
Supervised Learning: No Loss No Cry [51.07683542418145]
Supervised learning requires the specification of a loss function to minimise. This paper revisits the sc SLIsotron algorithm of Kakade et al. (2011) through a novel lens. We show how it provides a principled procedure for learning the loss.
arXiv Detail & Related papers (2020-02-10T05:30:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.