Conformal Prediction in Multi-User Settings: An Evaluation
- URL: http://arxiv.org/abs/2312.05195v1
- Date: Fri, 8 Dec 2023 17:33:23 GMT
- Title: Conformal Prediction in Multi-User Settings: An Evaluation
- Authors: Enrique Garcia-Ceja, Luciano Garcia-Banuelos, Nicolas Jourdan
- Abstract summary: Machine learning models are trained and evaluated without making any distinction between users.
This produces inaccurate performance metrics estimates in multi-user settings.
In this work we evaluated the conformal prediction framework in several multi-user settings.
- Score: 0.10231119246773925
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Typically, machine learning models are trained and evaluated without making
any distinction between users (e.g, using traditional hold-out and
cross-validation). However, this produces inaccurate performance metrics
estimates in multi-user settings. That is, situations where the data were
collected by multiple users with different characteristics (e.g., age, gender,
height, etc.) which is very common in user computer interaction and medical
applications. For these types of scenarios model evaluation strategies that
provide better performance estimates have been proposed such as mixed,
user-independent, user-dependent, and user-adaptive models. Although those
strategies are better suited for multi-user systems, they are typically
assessed with respect to performance metrics that capture the overall behavior
of the models and do not provide any performance guarantees for individual
predictions nor they provide any feedback about the predictions' uncertainty.
In order to overcome those limitations, in this work we evaluated the conformal
prediction framework in several multi-user settings. Conformal prediction is a
model agnostic method that provides confidence guarantees on the predictions,
thus, increasing the trustworthiness and robustness of the models. We conducted
extensive experiments using different evaluation strategies and found
significant differences in terms of conformal performance measures. We also
proposed several visualizations based on matrices, graphs, and charts that
capture different aspects of the resulting prediction sets.
Related papers
- Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Quantification of Predictive Uncertainty via Inference-Time Sampling [57.749601811982096]
We propose a post-hoc sampling strategy for estimating predictive uncertainty accounting for data ambiguity.
The method can generate different plausible outputs for a given input and does not assume parametric forms of predictive distributions.
arXiv Detail & Related papers (2023-08-03T12:43:21Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Post-Selection Confidence Bounds for Prediction Performance [2.28438857884398]
In machine learning, the selection of a promising model from a potentially large number of competing models and the assessment of its generalization performance are critical tasks.
We propose an algorithm how to compute valid lower confidence bounds for multiple models that have been selected based on their prediction performances in the evaluation set.
arXiv Detail & Related papers (2022-10-24T13:28:43Z) - Triplet Losses-based Matrix Factorization for Robust Recommendations [0.76146285961466]
We propose using multiple triplet losses terms to extract meaningful representations of users and items.
We empirically evaluate the soundness of such representations through several "bias-aware" evaluation metrics.
arXiv Detail & Related papers (2022-10-21T16:44:59Z) - Predictive Multiplicity in Probabilistic Classification [25.111463701666864]
We present a framework for measuring predictive multiplicity in probabilistic classification.
We demonstrate the incidence and prevalence of predictive multiplicity in real-world tasks.
Our results emphasize the need to report predictive multiplicity more widely.
arXiv Detail & Related papers (2022-06-02T16:25:29Z) - Post-hoc Models for Performance Estimation of Machine Learning Inference [22.977047604404884]
Estimating how well a machine learning model performs during inference is critical in a variety of scenarios.
We systematically generalize performance estimation to a diverse set of metrics and scenarios.
We find that proposed post-hoc models consistently outperform the standard confidence baselines.
arXiv Detail & Related papers (2021-10-06T02:20:37Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Learning Prediction Intervals for Model Performance [1.433758865948252]
We propose a method to compute prediction intervals for model performance.
We evaluate our approach across a wide range of drift conditions and show substantial improvement over competitive baselines.
arXiv Detail & Related papers (2020-12-15T21:32:03Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.