The Geometry and Calculus of Losses
- URL: http://arxiv.org/abs/2209.00238v2
- Date: Thu, 17 Aug 2023 14:57:27 GMT
- Title: The Geometry and Calculus of Losses
- Authors: Robert C. Williamson and Zac Cranko
- Abstract summary: We develop the theory of loss functions for binary and multiclass classification and class probability estimation problems.
The perspective provides three novel opportunities.
It enables the development of a fundamental relationship between losses and (anti)-norms that appears to have not been noticed before.
Second, it enables the development of a calculus of losses induced by the calculus of convex sets.
Third, the perspective leads to a natural theory of polar'' loss functions, which are derived from the polar dual of the convex set defining the loss.
- Score: 10.451984251615512
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Statistical decision problems lie at the heart of statistical machine
learning. The simplest problems are binary and multiclass classification and
class probability estimation. Central to their definition is the choice of loss
function, which is the means by which the quality of a solution is evaluated.
In this paper we systematically develop the theory of loss functions for such
problems from a novel perspective whose basic ingredients are convex sets with
a particular structure. The loss function is defined as the subgradient of the
support function of the convex set. It is consequently automatically proper
(calibrated for probability estimation). This perspective provides three novel
opportunities. It enables the development of a fundamental relationship between
losses and (anti)-norms that appears to have not been noticed before. Second,
it enables the development of a calculus of losses induced by the calculus of
convex sets which allows the interpolation between different losses, and thus
is a potential useful design tool for tailoring losses to particular problems.
In doing this we build upon, and considerably extend existing results on
$M$-sums of convex sets. Third, the perspective leads to a natural theory of
``polar'' loss functions, which are derived from the polar dual of the convex
set defining the loss, and which form a natural universal substitution function
for Vovk's aggregating algorithm.
Related papers
- Binary Losses for Density Ratio Estimation [2.512309434783062]
Estimating the ratio of two probability densities is a central problem in machine learning and statistics.
We provide a simple recipe for constructing loss functions with certain properties, such as loss functions that prioritize an accurate estimation of large values.
This contrasts with classical loss functions, such as the logistic loss or boosting loss, which prioritize accurate estimation of small values.
arXiv Detail & Related papers (2024-07-01T15:24:34Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - A survey and taxonomy of loss functions in machine learning [60.41650195728953]
Most state-of-the-art machine learning techniques revolve around the optimisation of loss functions.
This survey aims to provide a reference of the most essential loss functions for both beginner and advanced machine learning practitioners.
arXiv Detail & Related papers (2023-01-13T14:38:24Z) - Xtreme Margin: A Tunable Loss Function for Binary Classification
Problems [0.0]
We provide an overview of a novel loss function, the Xtreme Margin loss function.
Unlike the binary cross-entropy and the hinge loss functions, this loss function provides researchers and practitioners flexibility with their training process.
arXiv Detail & Related papers (2022-10-31T22:39:32Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - Hybridised Loss Functions for Improved Neural Network Generalisation [0.0]
Loss functions play an important role in the training of artificial neural networks (ANNs)
It has been shown that the cross entropy and sum squared error loss functions result in different training dynamics.
A hybrid of the entropy and sum squared error loss functions could combine the advantages of the two functions, while limiting their disadvantages.
arXiv Detail & Related papers (2022-04-26T11:52:11Z) - Generalization Bounds via Convex Analysis [12.411844611718958]
We show that it is possible to replace the mutual information by any strongly convex function of the joint input-output distribution.
Examples include bounds stated in terms of $p$-norm divergences and the Wasserstein-2 distance.
arXiv Detail & Related papers (2022-02-10T12:30:45Z) - Fundamental Limits and Tradeoffs in Invariant Representation Learning [99.2368462915979]
Many machine learning applications involve learning representations that achieve two competing goals.
Minimax game-theoretic formulation represents a fundamental tradeoff between accuracy and invariance.
We provide an information-theoretic analysis of this general and important problem under both classification and regression settings.
arXiv Detail & Related papers (2020-12-19T15:24:04Z) - All your loss are belong to Bayes [28.393499629583786]
Loss functions are a cornerstone of machine learning and the starting point of most algorithms.
We introduce a trick on squared Gaussian Processes to obtain a random process whose paths are compliant source functions.
Experimental results demonstrate substantial improvements over the state of the art.
arXiv Detail & Related papers (2020-06-08T14:31:21Z) - Semiparametric Nonlinear Bipartite Graph Representation Learning with
Provable Guarantees [106.91654068632882]
We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution.
We show that the proposed objective is strongly convex in a neighborhood around the ground truth, so that a gradient descent-based method achieves linear convergence rate.
Our estimator is robust to any model misspecification within the exponential family, which is validated in extensive experiments.
arXiv Detail & Related papers (2020-03-02T16:40:36Z) - Supervised Learning: No Loss No Cry [51.07683542418145]
Supervised learning requires the specification of a loss function to minimise.
This paper revisits the sc SLIsotron algorithm of Kakade et al. (2011) through a novel lens.
We show how it provides a principled procedure for learning the loss.
arXiv Detail & Related papers (2020-02-10T05:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.