Related papers: On Arbitrary Predictions from Equally Valid Models

On Arbitrary Predictions from Equally Valid Models

URL: http://arxiv.org/abs/2507.19408v1
Date: Fri, 25 Jul 2025 16:15:59 GMT
Title: On Arbitrary Predictions from Equally Valid Models
Authors: Sarah Lockfisch, Kristian Schwethelm, Martin Menten, Rickmer Braren, Daniel Rueckert, Alexander Ziller, Georgios Kaissis,
Abstract summary: Model multiplicity refers to multiple machine learning models that admit conflicting predictions for the same patient.<n>We show that even small ensembles can mitigate/eliminate predictive multiplicity in practice.
Score: 49.56463611078044
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model multiplicity refers to the existence of multiple machine learning models that describe the data equally well but may produce different predictions on individual samples. In medicine, these models can admit conflicting predictions for the same patient -- a risk that is poorly understood and insufficiently addressed. In this study, we empirically analyze the extent, drivers, and ramifications of predictive multiplicity across diverse medical tasks and model architectures, and show that even small ensembles can mitigate/eliminate predictive multiplicity in practice. Our analysis reveals that (1) standard validation metrics fail to identify a uniquely optimal model and (2) a substantial amount of predictions hinges on arbitrary choices made during model development. Using multiple models instead of a single model reveals instances where predictions differ across equally plausible models -- highlighting patients that would receive arbitrary diagnoses if any single model were used. In contrast, (3) a small ensemble paired with an abstention strategy can effectively mitigate measurable predictive multiplicity in practice; predictions with high inter-model consensus may thus be amenable to automated classification. While accuracy is not a principled antidote to predictive multiplicity, we find that (4) higher accuracy achieved through increased model capacity reduces predictive multiplicity. Our findings underscore the clinical importance of accounting for model multiplicity and advocate for ensemble-based strategies to improve diagnostic reliability. In cases where models fail to reach sufficient consensus, we recommend deferring decisions to expert review.

Related papers

Predictive Multiplicity in Survival Models: A Method for Quantifying Model Uncertainty in Predictive Maintenance Applications [0.0]
We frame predictive multiplicity as a critical concern in survival-based models.<n>We introduce formal measures -- ambiguity, discrepancy, and obscurity -- to quantify it.<n>This is particularly relevant for downstream tasks such as maintenance scheduling.
arXiv Detail & Related papers (2025-04-16T15:04:00Z)
Multi-View Conformal Learning for Heterogeneous Sensor Fusion [0.12086712057375555]
We build and test multi-view and single-view conformal models for heterogeneous sensor fusion. Our models provide theoretical marginal confidence guarantees since they are based on the conformal prediction framework. Our results also showed that multi-view models generate prediction sets with less uncertainty compared to single-view models.
arXiv Detail & Related papers (2024-02-19T17:30:09Z)
Predictive Churn with the Set of Good Models [61.00058053669447]
This paper explores connections between two seemingly unrelated concepts of predictive inconsistency.<n>The first, known as predictive multiplicity, occurs when models that perform similarly produce conflicting predictions for individual samples.<n>The second concept, predictive churn, examines the differences in individual predictions before and after model updates.
arXiv Detail & Related papers (2024-02-12T16:15:25Z)
Estimating Epistemic and Aleatoric Uncertainty with a Single Model [5.871583927216653]
We introduce a new approach to ensembling, hyper-diffusion models (HyperDM) HyperDM offers prediction accuracy on par with, and in some cases superior to, multi-model ensembles. We validate our method on two distinct real-world tasks: x-ray computed tomography reconstruction and weather temperature forecasting.
arXiv Detail & Related papers (2024-02-05T19:39:52Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
Rashomon Capacity: A Metric for Predictive Multiplicity in Probabilistic Classification [4.492630871726495]
Predictive multiplicity occurs when classification models assign conflicting predictions to individual samples. We introduce a new measure of predictive multiplicity in probabilistic classification called Rashomon Capacity. We show that Rashomon Capacity yields principled strategies for disclosing conflicting models to stakeholders.
arXiv Detail & Related papers (2022-06-02T20:44:19Z)
Predictive Multiplicity in Probabilistic Classification [25.111463701666864]
We present a framework for measuring predictive multiplicity in probabilistic classification. We demonstrate the incidence and prevalence of predictive multiplicity in real-world tasks. Our results emphasize the need to report predictive multiplicity more widely.
arXiv Detail & Related papers (2022-06-02T16:25:29Z)
Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points. Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters. We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z)
Characterizing Fairness Over the Set of Good Models Under Selective Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance. We provide tractable algorithms to compute the range of attainable group-level predictive disparities. We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)
A General Framework for Survival Analysis and Multi-State Modelling [70.31153478610229]
We use neural ordinary differential equations as a flexible and general method for estimating multi-state survival models. We show that our model exhibits state-of-the-art performance on popular survival data sets and demonstrate its efficacy in a multi-state setting.
arXiv Detail & Related papers (2020-06-08T19:24:54Z)
Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data. We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.