Observational Multiplicity
- URL: http://arxiv.org/abs/2507.23136v1
- Date: Wed, 30 Jul 2025 22:30:56 GMT
- Title: Observational Multiplicity
- Authors: Erin George, Deanna Needell, Berk Ustun,
- Abstract summary: We study how arbitrariness can arise in probabilistic classification tasks as a result of an effect that we call emphobservational.<n>We introduce a measure of regret for probabilistic classification tasks, which measures how predictions of a model could change as a result of different training labels change.<n>We use our measure to show that regret is higher for certain groups in the dataset and discuss potential applications of regret.
- Score: 13.001275742432053
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many prediction tasks can admit multiple models that can perform almost equally well. This phenomenon can can undermine interpretability and safety when competing models assign conflicting predictions to individuals. In this work, we study how arbitrariness can arise in probabilistic classification tasks as a result of an effect that we call \emph{observational multiplicity}. We discuss how this effect arises in a broad class of practical applications where we learn a classifier to predict probabilities $p_i \in [0,1]$ but are given a dataset of observations $y_i \in \{0,1\}$. We propose to evaluate the arbitrariness of individual probability predictions through the lens of \emph{regret}. We introduce a measure of regret for probabilistic classification tasks, which measures how the predictions of a model could change as a result of different training labels change. We present a general-purpose method to estimate the regret in a probabilistic classification task. We use our measure to show that regret is higher for certain groups in the dataset and discuss potential applications of regret. We demonstrate how estimating regret promote safety in real-world applications by abstention and data collection.
Related papers
- Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians [60.22542847840578]
Despite advances in adversarial machine learning, inference for Gaussian models in the presence of an adversary is notably understudied.
We consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables.
To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence.
arXiv Detail & Related papers (2024-11-21T17:46:55Z) - Learning and Predicting Multimodal Vehicle Action Distributions in a
Unified Probabilistic Model Without Labels [26.303522885475406]
We present a unified probabilistic model that learns a representative set of discrete vehicle actions and predicts the probability of each action given a particular scenario.
Our model also enables us to estimate the distribution over continuous trajectories conditioned on a scenario, representing what each discrete action would look like if executed in that scenario.
arXiv Detail & Related papers (2022-12-14T04:01:19Z) - Explainability as statistical inference [29.74336283497203]
We propose a general deep probabilistic model designed to produce interpretable predictions.
The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture.
We show experimentally that using multiple imputation provides more reasonable interpretations.
arXiv Detail & Related papers (2022-12-06T16:55:10Z) - Rashomon Capacity: A Metric for Predictive Multiplicity in Probabilistic
Classification [4.492630871726495]
Predictive multiplicity occurs when classification models assign conflicting predictions to individual samples.
We introduce a new measure of predictive multiplicity in probabilistic classification called Rashomon Capacity.
We show that Rashomon Capacity yields principled strategies for disclosing conflicting models to stakeholders.
arXiv Detail & Related papers (2022-06-02T20:44:19Z) - Predictive Multiplicity in Probabilistic Classification [25.111463701666864]
We present a framework for measuring predictive multiplicity in probabilistic classification.
We demonstrate the incidence and prevalence of predictive multiplicity in real-world tasks.
Our results emphasize the need to report predictive multiplicity more widely.
arXiv Detail & Related papers (2022-06-02T16:25:29Z) - Estimating Structural Disparities for Face Models [54.062512989859265]
In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations.
We explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation.
arXiv Detail & Related papers (2022-04-13T05:30:53Z) - Masked prediction tasks: a parameter identifiability view [49.533046139235466]
We focus on the widely used self-supervised learning method of predicting masked tokens.
We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
arXiv Detail & Related papers (2022-02-18T17:09:32Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z) - Thought Flow Nets: From Single Predictions to Trains of Model Thought [39.619001911390804]
When humans solve complex problems, they rarely come up with a decision right-away.
Instead, they start with an intuitive decision reflecting upon it, spot mistakes, resolve contradictions and jump between different hypotheses.
arXiv Detail & Related papers (2021-07-26T13:56:37Z) - Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances.
We present a general probabilistic framework that accommodates a variety of aggregate observations.
Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.