Related papers: Using predictive multiplicity to measure individual performance within the AI Act

Using predictive multiplicity to measure individual performance within the AI Act

URL: http://arxiv.org/abs/2602.11944v1
Date: Thu, 12 Feb 2026 13:40:00 GMT
Title: Using predictive multiplicity to measure individual performance within the AI Act
Authors: Karolin Frohnapfel, Mara Seyfert, Sebastian Bordt, Ulrike von Luxburg, Kristof Meding,
Abstract summary: This paper puts predictive multiplicity in context with the EU AI Act's provisions on accuracy.<n>We suggest that information about predictive multiplicity should be made available to deployers under the AI Act.
Score: 8.79823465328347
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When building AI systems for decision support, one often encounters the phenomenon of predictive multiplicity: a single best model does not exist; instead, one can construct many models with similar overall accuracy that differ in their predictions for individual cases. Especially when decisions have a direct impact on humans, this can be highly unsatisfactory. For a person subject to high disagreement between models, one could as well have chosen a different model of similar overall accuracy that would have decided the person's case differently. We argue that this arbitrariness conflicts with the EU AI Act, which requires providers of high-risk AI systems to report performance not only at the dataset level but also for specific persons. The goal of this paper is to put predictive multiplicity in context with the EU AI Act's provisions on accuracy and to subsequently derive concrete suggestions on how to evaluate and report predictive multiplicity in practice. Specifically: (1) We argue that incorporating information about predictive multiplicity can serve compliance with the EU AI Act's accuracy provisions for providers. (2) Based on this legal analysis, we suggest individual conflict ratios and $δ$-ambiguity as tools to quantify the disagreement between models on individual cases and to help detect individuals subject to conflicting predictions. (3) Based on computational insights, we derive easy-to-implement rules on how model providers could evaluate predictive multiplicity in practice. (4) Ultimately, we suggest that information about predictive multiplicity should be made available to deployers under the AI Act, enabling them to judge whether system outputs for specific individuals are reliable enough for their use case.

Related papers

Efficient Prediction of Pass@k Scaling in Large Language Models [34.59142976295509]
We show how to accurately predict a model's behavior when scaled to a massive number of attempts.<n>Standard methods for fitting these laws suffer from statistical shortcomings that hinder predictive accuracy.<n>We propose a dynamic sampling strategy that allocates a greater budget to harder problems.
arXiv Detail & Related papers (2025-10-06T17:42:27Z)
On Arbitrary Predictions from Equally Valid Models [49.56463611078044]
Model multiplicity refers to multiple machine learning models that admit conflicting predictions for the same patient.<n>We show that even small ensembles can mitigate/eliminate predictive multiplicity in practice.
arXiv Detail & Related papers (2025-07-25T16:15:59Z)
SConU: Selective Conformal Uncertainty in Large Language Models [59.25881667640868]
We propose a novel approach termed Selective Conformal Uncertainty (SConU)<n>We develop two conformal p-values that are instrumental in determining whether a given sample deviates from the uncertainty distribution of the calibration set at a specific manageable risk level.<n>Our approach not only facilitates rigorous management of miscoverage rates across both single-domain and interdisciplinary contexts, but also enhances the efficiency of predictions.
arXiv Detail & Related papers (2025-04-19T03:01:45Z)
Perceptions of the Fairness Impacts of Multiplicity in Machine Learning [22.442918897954957]
Multiplicity - the existence of multiple good models - means that some predictions are essentially arbitrary.<n>We conduct a survey to see how multiplicity impacts lay stakeholders' perceptions of machine learning fairness.<n>Our results indicate that model developers should be intentional about dealing with multiplicity in order to maintain fairness.
arXiv Detail & Related papers (2024-09-18T21:57:51Z)
Reconciling Model Multiplicity for Downstream Decision Making [24.335927243672952]
We show that even when the two predictive models approximately agree on their individual predictions almost everywhere, it is still possible for their induced best-response actions to differ on a substantial portion of the population. We propose a framework that calibrates the predictive models with regard to both the downstream decision-making problem and the individual probability prediction.
arXiv Detail & Related papers (2024-05-30T03:36:46Z)
Underspecification in Language Modeling Tasks: A Causality-Informed Study of Gendered Pronoun Resolution [0.0]
We introduce a simple causal mechanism to describe the role underspecification plays in the generation of spurious correlations. Despite its simplicity, our causal model directly informs the development of two lightweight black-box evaluation methods.
arXiv Detail & Related papers (2022-09-30T23:10:11Z)
Exploring the Trade-off between Plausibility, Change Intensity and Adversarial Power in Counterfactual Explanations using Multi-objective Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances. We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z)
Cross-model Fairness: Empirical Study of Fairness and Ethics Under Model Multiplicity [10.144058870887061]
We argue that individuals can be harmed when one predictor is chosen ad hoc from a group of equally well performing models. Our findings suggest that such unfairness can be readily found in real life and it may be difficult to mitigate by technical means alone.
arXiv Detail & Related papers (2022-03-14T14:33:39Z)
Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points. Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters. We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z)
Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap. We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z)
Characterizing Fairness Over the Set of Good Models Under Selective Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance. We provide tractable algorithms to compute the range of attainable group-level predictive disparities. We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)
Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data. We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.