Being Properly Improper
- URL: http://arxiv.org/abs/2106.09920v1
- Date: Fri, 18 Jun 2021 05:00:15 GMT
- Title: Being Properly Improper
- Authors: Richard Nock, Tyler Sypherd, Lalitha Sankar
- Abstract summary: We analyse class probability-based losses when they are stripped off the mandatory properness.
We show that a natural extension of a half-century old loss introduced by S. Arimoto is twist proper.
We then turn to a theory that has provided some of the best off-the-shelf algorithms for proper losses, boosting.
- Score: 36.52509571098292
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In today's ML, data can be twisted (changed) in various ways, either for bad
or good intent. Such twisted data challenges the founding theory of properness
for supervised losses which form the basis for many popular losses for class
probability estimation. Unfortunately, at its core, properness ensures that the
optimal models also learn the twist. In this paper, we analyse such class
probability-based losses when they are stripped off the mandatory properness;
we define twist-proper losses as losses formally able to retrieve the optimum
(untwisted) estimate off the twists, and show that a natural extension of a
half-century old loss introduced by S. Arimoto is twist proper. We then turn to
a theory that has provided some of the best off-the-shelf algorithms for proper
losses, boosting. Boosting can require access to the derivative of the convex
conjugate of a loss to compute examples weights. Such a function can be hard to
get, for computational or mathematical reasons; this turns out to be the case
for Arimoto's loss. We bypass this difficulty by inverting the problem as
follows: suppose a blueprint boosting algorithm is implemented with a general
weight update function. What are the losses for which boosting-compliant
minimisation happens? Our answer comes as a general boosting algorithm which
meets the optimal boosting dependence on the number of calls to the weak
learner; when applied to Arimoto's loss, it leads to a simple optimisation
algorithm whose performances are showcased on several domains and twists.
Related papers
- LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner.
We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z) - How to Boost Any Loss Function [63.573324901948716]
We show that any loss function can be optimized with boosting.
We also show that boosting can achieve a feat not yet known to be possible in the classical $0th$ order setting.
arXiv Detail & Related papers (2024-07-02T14:08:23Z) - What killed the Convex Booster ? [70.04715330065275]
A landmark negative result of Long and Servedio established a worst-case spectacular failure of a supervised learning trio.
We argue that the source of the negative result lies in the dark side of a pervasive -- and otherwise prized -- aspect of ML: textit parameterisation.
arXiv Detail & Related papers (2022-05-19T15:42:20Z) - Stochastic smoothing of the top-K calibrated hinge loss for deep
imbalanced classification [8.189630642296416]
We introduce a top-K hinge loss inspired by recent developments on top-K losses.
Our proposal is based on the smoothing of the top-K operator building on the flexible "perturbed" framework.
We show that our loss function performs very well in the case of balanced datasets, while benefiting from a significantly lower computational time.
arXiv Detail & Related papers (2022-02-04T15:39:32Z) - Do We Need to Penalize Variance of Losses for Learning with Label Noise? [91.38888889609002]
We find that the variance should be increased for the problem of learning with noisy labels.
By exploiting the label noise transition matrix, regularizers can be easily designed to reduce the variance of losses.
Empirically, the proposed method by increasing the variance of losses significantly improves the generalization ability of baselines on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-01-30T06:19:08Z) - Omnipredictors [19.735769148626588]
Loss minimization is a dominant paradigm in machine learning.
We introduce the notion of an ($mathcalL,mathcalC$)-omnipredictor, which could be used to optimize any loss in a family.
We show that such "loss-oblivious'' learning is feasible through a connection to multicalibration.
arXiv Detail & Related papers (2021-09-11T23:28:49Z) - All your loss are belong to Bayes [28.393499629583786]
Loss functions are a cornerstone of machine learning and the starting point of most algorithms.
We introduce a trick on squared Gaussian Processes to obtain a random process whose paths are compliant source functions.
Experimental results demonstrate substantial improvements over the state of the art.
arXiv Detail & Related papers (2020-06-08T14:31:21Z) - Supervised Learning: No Loss No Cry [51.07683542418145]
Supervised learning requires the specification of a loss function to minimise.
This paper revisits the sc SLIsotron algorithm of Kakade et al. (2011) through a novel lens.
We show how it provides a principled procedure for learning the loss.
arXiv Detail & Related papers (2020-02-10T05:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.