Supervised Learning: No Loss No Cry
- URL: http://arxiv.org/abs/2002.03555v1
- Date: Mon, 10 Feb 2020 05:30:52 GMT
- Title: Supervised Learning: No Loss No Cry
- Authors: Richard Nock and Aditya Krishna Menon
- Abstract summary: Supervised learning requires the specification of a loss function to minimise.
This paper revisits the sc SLIsotron algorithm of Kakade et al. (2011) through a novel lens.
We show how it provides a principled procedure for learning the loss.
- Score: 51.07683542418145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supervised learning requires the specification of a loss function to
minimise. While the theory of admissible losses from both a computational and
statistical perspective is well-developed, these offer a panoply of different
choices. In practice, this choice is typically made in an \emph{ad hoc} manner.
In hopes of making this procedure more principled, the problem of
\emph{learning the loss function} for a downstream task (e.g., classification)
has garnered recent interest. However, works in this area have been generally
empirical in nature.
In this paper, we revisit the {\sc SLIsotron} algorithm of Kakade et al.
(2011) through a novel lens, derive a generalisation based on Bregman
divergences, and show how it provides a principled procedure for learning the
loss. In detail, we cast {\sc SLIsotron} as learning a loss from a family of
composite square losses. By interpreting this through the lens of \emph{proper
losses}, we derive a generalisation of {\sc SLIsotron} based on Bregman
divergences. The resulting {\sc BregmanTron} algorithm jointly learns the loss
along with the classifier. It comes equipped with a simple guarantee of
convergence for the loss it learns, and its set of possible outputs comes with
a guarantee of agnostic approximability of Bayes rule. Experiments indicate
that the {\sc BregmanTron} substantially outperforms the {\sc SLIsotron}, and
that the loss it learns can be minimized by other algorithms for different
tasks, thereby opening the interesting problem of \textit{loss transfer}
between domains.
Related papers
- LegendreTron: Uprising Proper Multiclass Loss Learning [22.567234503869845]
Loss functions serve as the foundation of supervised learning and are often chosen prior to model development.
Recent works have sought to emphlearn losses and models jointly.
We present sc LegendreTron as a novel and practical method that jointly learns emphproper canonical losses and probabilities for multiclass problems.
arXiv Detail & Related papers (2023-01-27T13:10:45Z) - Regularized ERM on random subspaces [17.927376388967144]
We consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nystrom approaches for kernel methods.
Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded.
arXiv Detail & Related papers (2022-12-04T16:12:11Z) - What killed the Convex Booster ? [70.04715330065275]
A landmark negative result of Long and Servedio established a worst-case spectacular failure of a supervised learning trio.
We argue that the source of the negative result lies in the dark side of a pervasive -- and otherwise prized -- aspect of ML: textit parameterisation.
arXiv Detail & Related papers (2022-05-19T15:42:20Z) - Do Lessons from Metric Learning Generalize to Image-Caption Retrieval? [67.45267657995748]
The triplet loss with semi-hard negatives has become the de facto choice for image-caption retrieval (ICR) methods that are optimized from scratch.
Recent progress in metric learning has given rise to new loss functions that outperform the triplet loss on tasks such as image retrieval and representation learning.
We ask whether these findings generalize to the setting of ICR by comparing three loss functions on two ICR methods.
arXiv Detail & Related papers (2022-02-14T15:18:00Z) - Omnipredictors [19.735769148626588]
Loss minimization is a dominant paradigm in machine learning.
We introduce the notion of an ($mathcalL,mathcalC$)-omnipredictor, which could be used to optimize any loss in a family.
We show that such "loss-oblivious'' learning is feasible through a connection to multicalibration.
arXiv Detail & Related papers (2021-09-11T23:28:49Z) - Being Properly Improper [36.52509571098292]
We analyse class probability-based losses when they are stripped off the mandatory properness.
We show that a natural extension of a half-century old loss introduced by S. Arimoto is twist proper.
We then turn to a theory that has provided some of the best off-the-shelf algorithms for proper losses, boosting.
arXiv Detail & Related papers (2021-06-18T05:00:15Z) - All your loss are belong to Bayes [28.393499629583786]
Loss functions are a cornerstone of machine learning and the starting point of most algorithms.
We introduce a trick on squared Gaussian Processes to obtain a random process whose paths are compliant source functions.
Experimental results demonstrate substantial improvements over the state of the art.
arXiv Detail & Related papers (2020-06-08T14:31:21Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z) - Learning Near Optimal Policies with Low Inherent Bellman Error [115.16037976819331]
We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning.
We show that exploration is possible using only emphbatch assumptions with an algorithm that achieves the optimal statistical rate for the setting we consider.
arXiv Detail & Related papers (2020-02-29T02:02:40Z) - Over-parameterized Adversarial Training: An Analysis Overcoming the
Curse of Dimensionality [74.0084803220897]
Adversarial training is a popular method to give neural nets robustness against adversarial perturbations.
We show convergence to low robust training loss for emphpolynomial width instead of exponential, under natural assumptions and with the ReLU activation.
arXiv Detail & Related papers (2020-02-16T20:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.