Measuring IIA Violations in Similarity Choices with Bayesian Models
- URL: http://arxiv.org/abs/2508.14615v1
- Date: Wed, 20 Aug 2025 11:02:26 GMT
- Title: Measuring IIA Violations in Similarity Choices with Bayesian Models
- Authors: Hugo Sales CorrĂȘa, Suryanarayana Sankagiri, Daniel Ratton Figueiredo, Matthias Grossglauser,
- Abstract summary: Similarity choice data occur when humans make choices based on similarity to a target, e.g., in the context of information retrieval and in embedding learning settings.<n>While IIA violations have been detected in many discrete choice settings, the similarity choice setting has received scant attention.<n>We propose two statistical methods to test for IIA: a classical goodness-of-fit test and a Bayesian counterpart based on the framework of Posterior Predictive Checks.
- Score: 4.592329937682345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Similarity choice data occur when humans make choices among alternatives based on their similarity to a target, e.g., in the context of information retrieval and in embedding learning settings. Classical metric-based models of similarity choice assume independence of irrelevant alternatives (IIA), a property that allows for a simpler formulation. While IIA violations have been detected in many discrete choice settings, the similarity choice setting has received scant attention. This is because the target-dependent nature of the choice complicates IIA testing. We propose two statistical methods to test for IIA: a classical goodness-of-fit test and a Bayesian counterpart based on the framework of Posterior Predictive Checks (PPC). This Bayesian approach, our main technical contribution, quantifies the degree of IIA violation beyond its mere significance. We curate two datasets: one with choice sets designed to elicit IIA violations, and another with randomly generated choice sets from the same item universe. Our tests confirmed significant IIA violations on both datasets, and notably, we find a comparable degree of violation between them. Further, we devise a new PPC test for population homogeneity. Results show that the population is indeed homogenous, suggesting that the IIA violations are driven by context effects -- specifically, interactions within the choice sets. These results highlight the need for new similarity choice models that account for such context effects.
Related papers
- Differentially private testing for relevant dependencies in high dimensions [1.809722301908016]
We investigate the problem of detecting dependencies between the components of a high-dimensional vector.<n>Instead of testing whether the coordinates are pairwise independent, we are interested in determining whether certain pairwise associations do not exceed a given threshold in absolute value.<n>We propose a novel bootstrap based methodology that is especially powerful in sparse settings.
arXiv Detail & Related papers (2025-11-21T11:38:40Z) - Detecting and Identifying Selection Structure in Sequential Data [53.24493902162797]
We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences.
We show that selection structure is identifiable without any parametric assumptions or interventional experiments.
We also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies.
arXiv Detail & Related papers (2024-06-29T20:56:34Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Model-based causal feature selection for general response types [8.228587135343071]
Invariant causal prediction (ICP) is a method for causal feature selection which requires data from heterogeneous settings.
We develop transformation-model (TRAM) based ICP, allowing for continuous, categorical, count-type, and uninformatively censored responses.
We provide an open-source R package 'tramicp' and evaluate our approach on simulated data and in a case study investigating causal features of survival in critically ill patients.
arXiv Detail & Related papers (2023-09-22T12:42:48Z) - Bounding Counterfactuals under Selection Bias [60.55840896782637]
We propose a first algorithm to address both identifiable and unidentifiable queries.
We prove that, in spite of the missingness induced by the selection bias, the likelihood of the available data is unimodal.
arXiv Detail & Related papers (2022-07-26T10:33:10Z) - Selective Ensembles for Consistent Predictions [19.154189897847804]
inconsistency is undesirable in high-stakes contexts.
We show that this inconsistency extends beyond predictions to feature attributions.
We prove that selective ensembles achieve consistent predictions and feature attributions while maintaining low abstention rates.
arXiv Detail & Related papers (2021-11-16T05:03:56Z) - Deconfounding Scores: Feature Representations for Causal Effect
Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation.
We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data.
In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z) - Selecting Treatment Effects Models for Domain Adaptation Using Causal
Knowledge [82.5462771088607]
We propose a novel model selection metric specifically designed for ITE methods under the unsupervised domain adaptation setting.
In particular, we propose selecting models whose predictions of interventions' effects satisfy known causal structures in the target domain.
arXiv Detail & Related papers (2021-02-11T21:03:14Z) - Double machine learning for sample selection models [0.12891210250935145]
This paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition.
We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning-based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias.
arXiv Detail & Related papers (2020-11-30T19:40:21Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z) - Fundamental Limits of Testing the Independence of Irrelevant
Alternatives in Discrete Choice [9.13127392774573]
The Multinomial Logit (MNL) model and the Independence of Irrelevant Alternatives (IIA) are the most widely used tools of discrete choice.
We show that any general test for IIA with low worst-case error would require a number of samples exponential in the number of alternatives of the choice problem.
Our lower bounds are structure-dependent, and as a potential cause for optimism, we find that if one restricts the test of IIA to violations that can occur in a specific collection of choice sets, one obtains structure-dependent lower bounds that are much less pessimistic.
arXiv Detail & Related papers (2020-01-20T10:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.