Related papers: Perceptions of the Fairness Impacts of Multiplicity in Machine Learning

Perceptions of the Fairness Impacts of Multiplicity in Machine Learning

URL: http://arxiv.org/abs/2409.12332v2
Date: Thu, 23 Jan 2025 17:16:11 GMT
Title: Perceptions of the Fairness Impacts of Multiplicity in Machine Learning
Authors: Anna P. Meyer, Yea-Seul Kim, Aws Albarghouthi, Loris D'Antoni,
Abstract summary: Multiplicity - the existence of multiple good models - means that some predictions are essentially arbitrary.<n>We conduct a survey to see how multiplicity impacts lay stakeholders' perceptions of machine learning fairness.<n>Our results indicate that model developers should be intentional about dealing with multiplicity in order to maintain fairness.
Score: 22.442918897954957
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning (ML) is increasingly used in high-stakes settings, yet multiplicity - the existence of multiple good models - means that some predictions are essentially arbitrary. ML researchers and philosophers posit that multiplicity poses a fairness risk, but no studies have investigated whether stakeholders agree. In this work, we conduct a survey to see how multiplicity impacts lay stakeholders' - i.e., decision subjects' - perceptions of ML fairness, and which approaches to address multiplicity they prefer. We investigate how these perceptions are modulated by task characteristics (e.g., stakes and uncertainty). Survey respondents think that multiplicity threatens the fairness of model outcomes, but not the appropriateness of using the model, even though existing work suggests the opposite. Participants are strongly against resolving multiplicity by using a single model (effectively ignoring multiplicity) or by randomizing the outcomes. Our results indicate that model developers should be intentional about dealing with multiplicity in order to maintain fairness.

Related papers

On Arbitrary Predictions from Equally Valid Models [49.56463611078044]
Model multiplicity refers to multiple machine learning models that admit conflicting predictions for the same patient.<n>We show that even small ensembles can mitigate/eliminate predictive multiplicity in practice.
arXiv Detail & Related papers (2025-07-25T16:15:59Z)
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models [26.17300490736624]
Multimodal Large Language Models (MLLMs) are predominantly trained and tested on consistent visual-textual inputs. We propose the Multimodal Inconsistency Reasoning benchmark to assess MLLMs' ability to detect and reason about semantic mismatches. We evaluate six state-of-the-art MLLMs, showing that models with dedicated multimodal reasoning capabilities, such as o1, substantially outperform their counterparts.
arXiv Detail & Related papers (2025-02-22T01:52:37Z)
Diverging Preferences: When do Annotators Disagree and do Models Know? [92.24651142187989]
We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes. We find that the majority of disagreements are in opposition with standard reward modeling approaches. We develop methods for identifying diverging preferences to mitigate their influence on evaluation and training.
arXiv Detail & Related papers (2024-10-18T17:32:22Z)
Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large Language Models Attentive Readers? [6.525065859315515]
We investigate whether Large Language Models (LLMs) are prone to exploiting simplifying cues in multi-hop reasoning benchmarks. Motivated by this finding, we propose a challenging multi-hop reasoning benchmark, by generating seemingly plausible multi-hop reasoning chains. We find that their performance to perform multi-hop reasoning is affected, as indicated by up to 45% relative decrease in F1 score when presented with such seemingly plausible alternatives.
arXiv Detail & Related papers (2024-09-08T19:22:58Z)
Predictive Churn with the Set of Good Models [61.00058053669447]
This paper explores connections between two seemingly unrelated concepts of predictive inconsistency. The first, known as predictive multiplicity, occurs when models that perform similarly produce conflicting predictions for individual samples. The second concept, predictive churn, examines the differences in individual predictions before and after model updates.
arXiv Detail & Related papers (2024-02-12T16:15:25Z)
Revealing Multimodal Contrastive Representation Learning through Latent Partial Causal Models [85.67870425656368]
We introduce a unified causal model specifically designed for multimodal data. We show that multimodal contrastive representation learning excels at identifying latent coupled variables. Experiments demonstrate the robustness of our findings, even when the assumptions are violated.
arXiv Detail & Related papers (2024-02-09T07:18:06Z)
Recourse under Model Multiplicity via Argumentative Ensembling (Technical Report) [17.429631079094186]
We name recourse-aware ensembling, and identify several desirable properties which methods for solving it should satisfy. We show theoretically and experimentally that argumentative ensembling satisfies properties which the existing methods lack, and that the trade-offs are minimal wrt accuracy.
arXiv Detail & Related papers (2023-12-22T22:33:39Z)
An Empirical Investigation into Benchmarking Model Multiplicity for Trustworthy Machine Learning: A Case Study on Image Classification [0.8702432681310401]
This paper offers a one-stop empirical benchmark of multiplicity across various dimensions of model design. We also develop a framework, which we call multiplicity sheets, to benchmark multiplicity in various scenarios. We show that multiplicity persists in deep learning models even after enforcing additional specifications during model selection.
arXiv Detail & Related papers (2023-11-24T22:30:38Z)
Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs) We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z)
Fair Few-shot Learning with Auxiliary Sets [53.30014767684218]
In many machine learning (ML) tasks, only very few labeled data samples can be collected, which can lead to inferior fairness performance. In this paper, we define the fairness-aware learning task with limited training samples as the emphfair few-shot learning problem. We devise a novel framework that accumulates fairness-aware knowledge across different meta-training tasks and then generalizes the learned knowledge to meta-test tasks.
arXiv Detail & Related papers (2023-08-28T06:31:37Z)
Multi-Target Multiplicity: Flexibility and Fairness in Target Specification under Resource Constraints [76.84999501420938]
We introduce a conceptual and computational framework for assessing how the choice of target affects individuals' outcomes. We show that the level of multiplicity that stems from target variable choice can be greater than that stemming from nearly-optimal models of a single target.
arXiv Detail & Related papers (2023-06-23T18:57:14Z)
Arbitrariness Lies Beyond the Fairness-Accuracy Frontier [3.383670923637875]
We show that state-of-the-art fairness interventions can mask high predictive multiplicity behind favorable group fairness and accuracy metrics. We propose an ensemble algorithm applicable to any fairness intervention that provably ensures more consistent predictions.
arXiv Detail & Related papers (2023-06-15T18:15:46Z)
Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm. We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift. We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z)
Fairness Increases Adversarial Vulnerability [50.90773979394264]
This paper shows the existence of a dichotomy between fairness and robustness, and analyzes when achieving fairness decreases the model robustness to adversarial samples. Experiments on non-linear models and different architectures validate the theoretical findings in multiple vision domains. The paper proposes a simple, yet effective, solution to construct models achieving good tradeoffs between fairness and robustness.
arXiv Detail & Related papers (2022-11-21T19:55:35Z)
Cross-model Fairness: Empirical Study of Fairness and Ethics Under Model Multiplicity [10.144058870887061]
We argue that individuals can be harmed when one predictor is chosen ad hoc from a group of equally well performing models. Our findings suggest that such unfairness can be readily found in real life and it may be difficult to mitigate by technical means alone.
arXiv Detail & Related papers (2022-03-14T14:33:39Z)
MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning [14.06682547001011]
State-of-the art methods typically focus on learning a single reward model. We propose Multi-Objective Reinforced Active Learning (MORAL), a novel method for combining diverse demonstrations of social norms. Our approach is able to interactively tune a deep RL agent towards a variety of preferences, while eliminating the need for computing multiple policies.
arXiv Detail & Related papers (2021-12-30T19:21:03Z)
Multi-Stage Decentralized Matching Markets: Uncertain Preferences and Strategic Behaviors [91.3755431537592]
This article develops a framework for learning optimal strategies in real-world matching markets. We show that there exists a welfare-versus-fairness trade-off that is characterized by the uncertainty level of acceptance. We prove that participants can be better off with multi-stage matching compared to single-stage matching.
arXiv Detail & Related papers (2021-02-13T19:25:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.