Processes Matter: How ML/GAI Approaches Could Support Open Qualitative Coding of Online Discourse Datasets
- URL: http://arxiv.org/abs/2504.02887v1
- Date: Wed, 02 Apr 2025 13:43:54 GMT
- Title: Processes Matter: How ML/GAI Approaches Could Support Open Qualitative Coding of Online Discourse Datasets
- Authors: John Chen, Alexandros Lotsos, Grace Wang, Lexie Zhao, Bruce Sherin, Uri Wilensky, Michael Horn,
- Abstract summary: We compare open coding results by five recently published ML/GAI approaches and four human coders.<n>Line-by-line AI approaches effectively identify content-based codes, while humans excel in interpreting conversational dynamics.<n>Instead of replacing humans in open coding, researchers should integrate AI with and according to their analytical processes.
- Score: 39.96179530555875
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Open coding, a key inductive step in qualitative research, discovers and constructs concepts from human datasets. However, capturing extensive and nuanced aspects or "coding moments" can be challenging, especially with large discourse datasets. While some studies explore machine learning (ML)/Generative AI (GAI)'s potential for open coding, few evaluation studies exist. We compare open coding results by five recently published ML/GAI approaches and four human coders, using a dataset of online chat messages around a mobile learning software. Our systematic analysis reveals ML/GAI approaches' strengths and weaknesses, uncovering the complementary potential between humans and AI. Line-by-line AI approaches effectively identify content-based codes, while humans excel in interpreting conversational dynamics. We discussed how embedded analytical processes could shape the results of ML/GAI approaches. Instead of replacing humans in open coding, researchers should integrate AI with and according to their analytical processes, e.g., as parallel co-coders.
Related papers
- Human and Machine: How Software Engineers Perceive and Engage with AI-Assisted Code Reviews Compared to Their Peers [4.734450431444635]
We investigate how software engineers perceive and engage with Large Language Model (LLM)-assisted code reviews.
We found that engagement in code review is multi-dimensional, spanning cognitive, emotional, and behavioral dimensions.
Our findings contribute to a deeper understanding of how AI tools are impacting SE socio-technical processes.
arXiv Detail & Related papers (2025-01-03T20:42:51Z) - A Computational Method for Measuring "Open Codes" in Qualitative Analysis [47.358809793796624]
Open coding is an inductive qualitative process that identifies and interprets "open codes" from datasets.
We present a computational method to measure and identify potential biases from "open codes" systematically.
arXiv Detail & Related papers (2024-11-19T00:44:56Z) - Prompts Matter: Comparing ML/GAI Approaches for Generating Inductive Qualitative Coding Results [39.96179530555875]
generative AI (GAI) tools rely on instructions to work, and how to instruct it may matter.
This study applied two known and two theory-informed novel approaches to an online community dataset and evaluated the resulting coding results.
Our findings show significant discrepancies between ML/GAI approaches and demonstrate the advantage of our approaches.
arXiv Detail & Related papers (2024-11-10T00:23:55Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - How Far Have We Gone in Binary Code Understanding Using Large Language Models [51.527805834378974]
We propose a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in binary code understanding.
Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2024-04-15T14:44:08Z) - Can AI Serve as a Substitute for Human Subjects in Software Engineering
Research? [24.39463126056733]
This vision paper proposes a novel approach to qualitative data collection in software engineering research by harnessing the capabilities of artificial intelligence (AI)
We explore the potential of AI-generated synthetic text as an alternative source of qualitative data.
We discuss the prospective development of new foundation models aimed at emulating human behavior in observational studies and user evaluations.
arXiv Detail & Related papers (2023-11-18T14:05:52Z) - Towards Coding Social Science Datasets with Language Models [4.280286557747323]
Researchers often rely on humans to code (label, annotate, etc.) large sets of texts.
Recent advances in a specific kind of artificial intelligence tool - language models (LMs) - provide a solution.
We find that GPT-3 can match the performance of typical human coders and offers benefits over other machine learning methods of coding text.
arXiv Detail & Related papers (2023-06-03T19:11:34Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.