A Computational Method for Measuring "Open Codes" in Qualitative Analysis
- URL: http://arxiv.org/abs/2411.12142v2
- Date: Tue, 26 Nov 2024 04:31:07 GMT
- Title: A Computational Method for Measuring "Open Codes" in Qualitative Analysis
- Authors: John Chen, Alexandros Lotsos, Lexie Zhao, Caiyi Wang, Jessica Hullman, Bruce Sherin, Uri Wilensky, Michael Horn,
- Abstract summary: Open coding is an inductive qualitative process that identifies and interprets "open codes" from datasets.
We present a computational method to measure and identify potential biases from "open codes" systematically.
- Score: 47.358809793796624
- License:
- Abstract: Qualitative analysis is critical to understanding human datasets in many social science disciplines. Open coding is an inductive qualitative process that identifies and interprets "open codes" from datasets. Yet, meeting methodological expectations (such as "as exhaustive as possible") can be challenging. While many machine learning (ML)/generative AI (GAI) studies have attempted to support open coding, few have systematically measured or evaluated GAI outcomes, increasing potential bias risks. Building on Grounded Theory and Thematic Analysis theories, we present a computational method to measure and identify potential biases from "open codes" systematically. Instead of operationalizing human expert results as the "ground truth," our method is built upon a team-based approach between human and machine coders. We experiment with two HCI datasets to establish this method's reliability by 1) comparing it with human analysis, and 2) analyzing its output stability. We present evidence-based suggestions and example workflows for ML/GAI to support open coding.
Related papers
- Prompts Matter: Comparing ML/GAI Approaches for Generating Inductive Qualitative Coding Results [39.96179530555875]
generative AI (GAI) tools rely on instructions to work, and how to instruct it may matter.
This study applied two known and two theory-informed novel approaches to an online community dataset and evaluated the resulting coding results.
Our findings show significant discrepancies between ML/GAI approaches and demonstrate the advantage of our approaches.
arXiv Detail & Related papers (2024-11-10T00:23:55Z) - Whither Bias Goes, I Will Go: An Integrative, Systematic Review of Algorithmic Bias Mitigation [1.0470286407954037]
Concerns have been raised that machine learning (ML) models may be biased and perpetuate or exacerbate inequality.
We present a four-stage model of developing ML assessments and applying bias mitigation methods.
arXiv Detail & Related papers (2024-10-21T02:32:14Z) - DISCOVER: A Data-driven Interactive System for Comprehensive Observation, Visualization, and ExploRation of Human Behaviour [6.716560115378451]
We introduce a modular, flexible, yet user-friendly software framework specifically developed to streamline computational-driven data exploration for human behavior analysis.
Our primary objective is to democratize access to advanced computational methodologies, thereby enabling researchers across disciplines to engage in detailed behavioral analysis without the need for extensive technical proficiency.
arXiv Detail & Related papers (2024-07-18T11:28:52Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - FiFAR: A Fraud Detection Dataset for Learning to Defer [9.187694794359498]
We introduce the Financial Fraud Alert Review dataset (FiFAR), a synthetic bank account fraud detection dataset.
FiFAR contains the predictions of a team of 50 highly complex and varied synthetic fraud analysts, with varied bias and feature dependence.
We use our dataset to develop a capacity-aware L2D method and rejection learning approach under realistic data availability conditions.
arXiv Detail & Related papers (2023-12-20T17:36:36Z) - Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms [88.93372675846123]
We propose a task-agnostic evaluation framework Camilla for evaluating machine learning algorithms.
We use cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills of each sample.
In our experiments, Camilla outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.
arXiv Detail & Related papers (2023-07-14T03:15:56Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z) - Human-Algorithm Collaboration: Achieving Complementarity and Avoiding
Unfairness [92.26039686430204]
We show that even in carefully-designed systems, complementary performance can be elusive.
First, we provide a theoretical framework for modeling simple human-algorithm systems.
Next, we use this model to prove conditions where complementarity is impossible.
arXiv Detail & Related papers (2022-02-17T18:44:41Z) - A Comparative Approach to Explainable Artificial Intelligence Methods in
Application to High-Dimensional Electronic Health Records: Examining the
Usability of XAI [0.0]
XAI aims to produce a demonstrative factor of trust, which for human subjects is achieved through communicative means.
The ideology behind trusting a machine to tend towards the livelihood of a human poses an ethical conundrum.
XAI methods produce visualization of the feature contribution towards a given models output on both a local and global level.
arXiv Detail & Related papers (2021-03-08T18:15:52Z) - Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap.
We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert.
Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.