Designing Disaggregated Evaluations of AI Systems: Choices,
Considerations, and Tradeoffs
- URL: http://arxiv.org/abs/2103.06076v1
- Date: Wed, 10 Mar 2021 14:26:14 GMT
- Title: Designing Disaggregated Evaluations of AI Systems: Choices,
Considerations, and Tradeoffs
- Authors: Solon Barocas, Anhong Guo, Ece Kamar, Jacquelyn Krones, Meredith
Ringel Morris, Jennifer Wortman Vaughan, Duncan Wadsworth, Hanna Wallach
- Abstract summary: We argue that a deeper understanding of the choices, considerations, and tradeoffs involved in designing disaggregated evaluations will better enable researchers, practitioners, and the public to understand the ways in which AI systems may be underperforming for particular groups of people.
- Score: 42.401239658653914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several pieces of work have uncovered performance disparities by conducting
"disaggregated evaluations" of AI systems. We build on these efforts by
focusing on the choices that must be made when designing a disaggregated
evaluation, as well as some of the key considerations that underlie these
design choices and the tradeoffs between these considerations. We argue that a
deeper understanding of the choices, considerations, and tradeoffs involved in
designing disaggregated evaluations will better enable researchers,
practitioners, and the public to understand the ways in which AI systems may be
underperforming for particular groups of people.
Related papers
- A Novel Mathematical Framework for Objective Characterization of Ideas through Vector Embeddings in LLM [0.0]
This study introduces a comprehensive mathematical framework for automated analysis to objectively evaluate the plethora of ideas generated by CAI systems and/or humans.
By converting the ideas into higher dimensional vectors and quantitatively measuring the diversity between them using tools such as UMAP, DBSCAN and PCA, the proposed method provides a reliable and objective way of selecting the most promising ideas.
arXiv Detail & Related papers (2024-09-11T19:10:29Z) - Negotiating the Shared Agency between Humans & AI in the Recommender System [1.4249472316161877]
Concerns about user agency have arisen due to the inherent opacity (information asymmetry) and the nature of one-way output (power asymmetry) on algorithms.
We seek to understand how types of agency impact user perception and experience, and bring empirical evidence to refine the guidelines and designs for human-AI interactive systems.
arXiv Detail & Related papers (2024-03-23T19:23:08Z) - Beyond Recommender: An Exploratory Study of the Effects of Different AI
Roles in AI-Assisted Decision Making [48.179458030691286]
We examine three AI roles: Recommender, Analyzer, and Devil's Advocate.
Our results show each role's distinct strengths and limitations in task performance, reliance appropriateness, and user experience.
These insights offer valuable implications for designing AI assistants with adaptive functional roles according to different situations.
arXiv Detail & Related papers (2024-03-04T07:32:28Z) - Evaluative Item-Contrastive Explanations in Rankings [47.24529321119513]
This paper advocates for the application of a specific form of Explainable AI -- namely, contrastive explanations -- as well-suited for addressing ranking problems.
The present work introduces Evaluative Item-Contrastive Explanations tailored for ranking systems and illustrates its application and characteristics through an experiment conducted on publicly available data.
arXiv Detail & Related papers (2023-12-14T15:40:51Z) - Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments.
It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z) - Video Surveillance System Incorporating Expert Decision-making Process:
A Case Study on Detecting Calving Signs in Cattle [5.80793470875286]
In this study, we examine the framework of a video surveillance AI system that presents the reasoning behind predictions by incorporating experts' decision-making processes with rich domain knowledge of the notification target.
In our case study, we designed a system for detecting signs of calving in cattle based on the proposed framework and evaluated the system through a user study with people involved in livestock farming.
arXiv Detail & Related papers (2023-01-10T12:06:49Z) - Doubting AI Predictions: Influence-Driven Second Opinion Recommendation [92.30805227803688]
We propose a way to augment human-AI collaboration by building on a common organizational practice: identifying experts who are likely to provide complementary opinions.
The proposed approach aims to leverage productive disagreement by identifying whether some experts are likely to disagree with an algorithmic assessment.
arXiv Detail & Related papers (2022-04-29T20:35:07Z) - AI for human assessment: What do professional assessors need? [33.88509725285237]
This case study aims to help professional assessors make decisions in human assessment, in which they conduct interviews with assessees and evaluate their suitability for certain job roles.
A computational system that can extract nonverbal cues of assesses would be beneficial to assessors in terms of supporting their decision making.
We developed such a system based on an unsupervised anomaly detection algorithm using multimodal behavioral features such as facial keypoints, pose, head pose, and gaze.
arXiv Detail & Related papers (2022-04-18T03:35:37Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Assessing the Fairness of AI Systems: AI Practitioners' Processes,
Challenges, and Needs for Support [18.148737010217953]
We conduct interviews and workshops with AI practitioners to identify practitioners' processes, challenges, and needs for support.
We find that practitioners face challenges when choosing performance metrics, identifying the most relevant direct stakeholders and demographic groups.
We identify impacts on fairness work stemming from a lack of engagement with direct stakeholders, business imperatives that prioritize customers over marginalized groups, and the drive to deploy AI systems at scale.
arXiv Detail & Related papers (2021-12-10T17:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.