Realizable $H$-Consistent and Bayes-Consistent Loss Functions for Learning to Defer
- URL: http://arxiv.org/abs/2407.13732v1
- Date: Thu, 18 Jul 2024 17:35:03 GMT
- Title: Realizable $H$-Consistent and Bayes-Consistent Loss Functions for Learning to Defer
- Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong,
- Abstract summary: We introduce a broad family of surrogate losses, parameterized by a non-increasing function $Psi$, and establish their realizable $H$-consistency under mild conditions.
For cost functions based on classification error, we show that these losses admit $H$-consistency bounds when the hypothesis set is symmetric and complete.
- Score: 30.389055604165222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a comprehensive study of surrogate loss functions for learning to defer. We introduce a broad family of surrogate losses, parameterized by a non-increasing function $\Psi$, and establish their realizable $H$-consistency under mild conditions. For cost functions based on classification error, we further show that these losses admit $H$-consistency bounds when the hypothesis set is symmetric and complete, a property satisfied by common neural network and linear function hypothesis sets. Our results also resolve an open question raised in previous work (Mozannar et al., 2023) by proving the realizable $H$-consistency and Bayes-consistency of a specific surrogate loss. Furthermore, we identify choices of $\Psi$ that lead to $H$-consistent surrogate losses for any general cost function, thus achieving Bayes-consistency, realizable $H$-consistency, and $H$-consistency bounds simultaneously. We also investigate the relationship between $H$-consistency bounds and realizable $H$-consistency in learning to defer, highlighting key differences from standard classification. Finally, we empirically evaluate our proposed surrogate losses and compare them with existing baselines.
Related papers
- Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$.
We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z) - Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation.
We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z) - Top-$k$ Classification and Cardinality-Aware Prediction [30.389055604165222]
We show that comp-sum and constrained losses are supported by $H$-consistency bounds with respect to the top-$k$ loss.
We introduce cardinality-aware loss functions through instance-dependent cost-sensitive learning.
Minimizing these losses leads to new cardinality-aware algorithms for top-$k$ classification.
arXiv Detail & Related papers (2024-03-28T17:45:03Z) - $H$-Consistency Guarantees for Regression [30.389055604165222]
We first present new theorems that generalize the tools previously given to establish $H$-consistency bounds.
We then prove a series of novel $H$-consistency bounds for surrogate loss functions of the squared loss.
We further leverage our analysis of $H$-consistency for regression and derive principled surrogate losses for adversarial regression.
arXiv Detail & Related papers (2024-03-28T15:08:51Z) - Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Cross-Entropy Loss Functions: Theoretical Analysis and Applications [27.3569897539488]
We present a theoretical analysis of a broad family of loss functions, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions.
We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds.
This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss.
arXiv Detail & Related papers (2023-04-14T17:58:23Z) - $\mathscr{H}$-Consistency Estimation Error of Surrogate Loss Minimizers [38.56401704010528]
We present a detailed study of estimation errors in terms of surrogate loss estimation errors.
We refer to such guarantees as $mathscrH$-consistency estimation error bounds.
arXiv Detail & Related papers (2022-05-16T23:13:36Z) - Label Distributionally Robust Losses for Multi-class Classification:
Consistency, Robustness and Adaptivity [55.29408396918968]
We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification.
Our contributions include both consistency and robustness by establishing top-$k$ consistency of LDR losses for multi-class classification.
We propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance.
arXiv Detail & Related papers (2021-12-30T00:27:30Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - Rethinking and Reweighting the Univariate Losses for Multi-Label
Ranking: Consistency and Generalization [44.73295800450414]
(Partial) ranking loss is a commonly used evaluation measure for multi-label classification.
There is a gap between existing theory and practice -- some pairwise losses can lead to promising performance but lack consistency.
arXiv Detail & Related papers (2021-05-10T09:23:27Z) - A surrogate loss function for optimization of $F_\beta$ score in binary
classification with imbalanced data [0.0]
The gradient paths of the proposed surrogate $F_beta$ loss function approximate the gradient paths of the large sample limit of the $F_beta$ score.
It is demonstrated that the proposed surrogate $F_beta$ loss function is effective for optimizing $F_beta$ scores under class imbalances.
arXiv Detail & Related papers (2021-04-03T18:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.