LegendreTron: Uprising Proper Multiclass Loss Learning
- URL: http://arxiv.org/abs/2301.11695v3
- Date: Tue, 28 Nov 2023 21:36:53 GMT
- Title: LegendreTron: Uprising Proper Multiclass Loss Learning
- Authors: Kevin Lam, Christian Walder, Spiridon Penev, Richard Nock
- Abstract summary: Loss functions serve as the foundation of supervised learning and are often chosen prior to model development.
Recent works have sought to emphlearn losses and models jointly.
We present sc LegendreTron as a novel and practical method that jointly learns emphproper canonical losses and probabilities for multiclass problems.
- Score: 22.567234503869845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Loss functions serve as the foundation of supervised learning and are often
chosen prior to model development. To avoid potentially ad hoc choices of
losses, statistical decision theory describes a desirable property for losses
known as \emph{properness}, which asserts that Bayes' rule is optimal. Recent
works have sought to \emph{learn losses} and models jointly. Existing methods
do this by fitting an inverse canonical link function which monotonically maps
$\mathbb{R}$ to $[0,1]$ to estimate probabilities for binary problems. In this
paper, we extend monotonicity to maps between $\mathbb{R}^{C-1}$ and the
projected probability simplex $\tilde{\Delta}^{C-1}$ by using monotonicity of
gradients of convex functions. We present {\sc LegendreTron} as a novel and
practical method that jointly learns \emph{proper canonical losses} and
probabilities for multiclass problems. Tested on a benchmark of domains with up
to 1,000 classes, our experimental results show that our method consistently
outperforms the natural multiclass baseline under a $t$-test at 99%
significance on all datasets with greater than 10 classes.
Related papers
- IT$^3$: Idempotent Test-Time Training [95.78053599609044]
This paper introduces Idempotent Test-Time Training (IT$3$), a novel approach to addressing the challenge of distribution shift.
IT$3$ is based on the universal property of idempotence.
We demonstrate the versatility of our approach across various tasks, including corrupted image classification.
arXiv Detail & Related papers (2024-10-05T15:39:51Z) - Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss [33.18537822803389]
We show that whenever the topologies of $L2$ and $Psi_p$ are comparable on our hypothesis class $mathscrF$, $mathscrF$ is a weakly sub-Gaussian class.
Our result holds whether the problem is realizable or not and we refer to this as a emphnear mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term.
arXiv Detail & Related papers (2024-02-08T18:57:42Z) - Omnipredictors for Regression and the Approximate Rank of Convex
Functions [7.549720420141907]
An textitomnipredictor for a class $mathcal L$ of loss functions and a class $mathcal C$ of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in $mathcal C$ for every loss in $mathcal L$.
arXiv Detail & Related papers (2024-01-26T04:29:53Z) - Agnostically Learning Multi-index Models with Queries [54.290489524576756]
We study the power of query access for the task of agnostic learning under the Gaussian distribution.
We show that query access gives significant runtime improvements over random examples for agnostically learning MIMs.
arXiv Detail & Related papers (2023-12-27T15:50:47Z) - Adversarial Contextual Bandits Go Kernelized [21.007410990554522]
We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a Hilbert kernel space.
We propose a new optimistically biased estimator for the loss functions and reproducing near-optimal regret guarantees.
arXiv Detail & Related papers (2023-10-02T19:59:39Z) - HappyMap: A Generalized Multi-calibration Method [23.086009024383024]
Multi-calibration is a powerful and evolving concept originating in the field of algorithmic fairness.
In this work, we view the term $(f(x)-y)$ as just one specific mapping, and explore the power of an enriched class of mappings.
We propose textitHappyMap, a generalization of multi-calibration, which yields a wide range of new applications.
arXiv Detail & Related papers (2023-03-08T05:05:01Z) - High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad
Stepsize [55.0090961425708]
We propose a new, simplified high probability analysis of AdaGrad for smooth, non- probability problems.
We present our analysis in a modular way and obtain a complementary $mathcal O (1 / TT)$ convergence rate in the deterministic setting.
To the best of our knowledge, this is the first high probability for AdaGrad with a truly adaptive scheme, i.e., completely oblivious to the knowledge of smoothness.
arXiv Detail & Related papers (2022-04-06T13:50:33Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Coresets for Classification -- Simplified and Strengthened [19.54307474041768]
We give relative error coresets for training linear classifiers with a broad class of loss functions.
Our construction achieves $tilde O(d cdot mu_y(X)2/epsilon2)$ points, where $mu_y(X)$ is a natural complexity measure of the data matrix $X in mathbbRn times d$ and label vector $y in -1,1n$.
arXiv Detail & Related papers (2021-06-08T11:24:18Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z) - Supervised Learning: No Loss No Cry [51.07683542418145]
Supervised learning requires the specification of a loss function to minimise.
This paper revisits the sc SLIsotron algorithm of Kakade et al. (2011) through a novel lens.
We show how it provides a principled procedure for learning the loss.
arXiv Detail & Related papers (2020-02-10T05:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.