What should an AI assessor optimise for?
- URL: http://arxiv.org/abs/2502.00365v1
- Date: Sat, 01 Feb 2025 08:41:57 GMT
- Title: What should an AI assessor optimise for?
- Authors: Daniel Romero-Alvarado, Fernando Martínez-Plumed, José Hernández-Orallo,
- Abstract summary: An AI assessor is an external, ideally indepen-dent system that predicts an indicator, e.g., a loss value, of another AI system.
Here we address the question: is it always optimal to train the assessor for the target metric?
We experimentally explore this question for, respectively, regression losses and classification scores with monotonic and non-monotonic mappings.
- Score: 57.96463917842822
- License:
- Abstract: An AI assessor is an external, ideally indepen-dent system that predicts an indicator, e.g., a loss value, of another AI system. Assessors can lever-age information from the test results of many other AI systems and have the flexibility of be-ing trained on any loss function or scoring rule: from squared error to toxicity metrics. Here we address the question: is it always optimal to train the assessor for the target metric? Or could it be better to train for a different metric and then map predictions back to the target metric? Us-ing twenty regression and classification problems with tabular data, we experimentally explore this question for, respectively, regression losses and classification scores with monotonic and non-monotonic mappings and find that, contrary to intuition, optimising for more informative met-rics is not generally better. Surprisingly, some monotonic transformations are promising. For example, the logistic loss is useful for minimis-ing absolute or quadratic errors in regression, and the logarithmic score helps maximise quadratic or spherical scores in classification.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - $F_β$-plot -- a visual tool for evaluating imbalanced data classifiers [0.0]
The paper proposes a simple approach to analyzing the popular parametric metric $F_beta$.
It is possible to indicate for a given pool of analyzed classifiers when a given model should be preferred depending on user requirements.
arXiv Detail & Related papers (2024-04-11T18:07:57Z) - Revisiting Evaluation Metrics for Semantic Segmentation: Optimization
and Evaluation of Fine-grained Intersection over Union [113.20223082664681]
We propose the use of fine-grained mIoUs along with corresponding worst-case metrics.
These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing.
Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects.
arXiv Detail & Related papers (2023-10-30T03:45:15Z) - Who Should Predict? Exact Algorithms For Learning to Defer to Humans [40.22768241509553]
We show that prior approaches can fail to find a human-AI system with low misclassification error.
We give a mixed-integer-linear-programming (MILP) formulation that can optimally solve the problem in the linear setting.
We provide a novel surrogate loss function that is realizable-consistent and performs well empirically.
arXiv Detail & Related papers (2023-01-15T21:57:36Z) - Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation [59.500347564280204]
We propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework.
AUR consists of a new uncertainty estimator along with a normal recommender model.
As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty.
arXiv Detail & Related papers (2022-09-22T04:32:51Z) - Optimizing Partial Area Under the Top-k Curve: Theory and Practice [151.5072746015253]
We develop a novel metric named partial Area Under the top-k Curve (AUTKC)
AUTKC has a better discrimination ability, and its Bayes optimal score function could give a correct top-K ranking with respect to the conditional probability.
We present an empirical surrogate risk minimization framework to optimize the proposed metric.
arXiv Detail & Related papers (2022-09-03T11:09:13Z) - Active Learning Improves Performance on Symbolic RegressionTasks in
StackGP [2.7685408681770247]
We introduce an active learning method for symbolic regression using StackGP.
We use the Feynman AI benchmark set of equations to examine the ability of our method to find appropriate models using fewer data points.
arXiv Detail & Related papers (2022-02-09T20:05:22Z) - Towards optimally abstaining from prediction [22.937799541125607]
A common challenge across all areas of machine learning is that training data is not distributed like test data.
We consider a model where one may abstain from predicting, at a fixed cost.
Our work builds on a recent abstention algorithm of Goldwasser, Kalais, and Montasser ( 2020) for transductive binary classification.
arXiv Detail & Related papers (2021-05-28T21:44:48Z) - Learning by Minimizing the Sum of Ranked Range [58.24935359348289]
We introduce the sum of ranked range (SoRR) as a general approach to form learning objectives.
A ranked range is a consecutive sequence of sorted values of a set of real numbers.
We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification.
arXiv Detail & Related papers (2020-10-05T01:58:32Z) - A First Step Towards Distribution Invariant Regression Metrics [1.370633147306388]
In classification, it has been stated repeatedly that performance metrics like the F-Measure and Accuracy are highly dependent on the class distribution.
We show that the same problem exists in regression. The distribution of odometry parameters in robotic applications can for example largely vary between different recording sessions.
Here, we need regression algorithms that either perform equally well for all function values, or that focus on certain boundary regions like high speed.
arXiv Detail & Related papers (2020-09-10T23:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.