Thematic Analysis with Open-Source Generative AI and Machine Learning: A New Method for Inductive Qualitative Codebook Development
- URL: http://arxiv.org/abs/2410.03721v1
- Date: Sat, 28 Sep 2024 18:52:16 GMT
- Title: Thematic Analysis with Open-Source Generative AI and Machine Learning: A New Method for Inductive Qualitative Codebook Development
- Authors: Andrew Katz, Gabriella Coloyan Fleming, Joyce Main,
- Abstract summary: We present the Generative AI-enabled Theme Organization and Structuring (GATOS) workflow.
It uses open-source machine learning techniques, natural language processing tools, and generative text models to facilitate thematic analysis.
We show that the GATOS workflow is able to identify themes in the text that were used to generate the original synthetic datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper aims to answer one central question: to what extent can open-source generative text models be used in a workflow to approximate thematic analysis in social science research? To answer this question, we present the Generative AI-enabled Theme Organization and Structuring (GATOS) workflow, which uses open-source machine learning techniques, natural language processing tools, and generative text models to facilitate thematic analysis. To establish validity of the method, we present three case studies applying the GATOS workflow, leveraging these models and techniques to inductively create codebooks similar to traditional procedures using thematic analysis. Specifically, we investigate the extent to which a workflow comprising open-source models and tools can inductively produce codebooks that approach the known space of themes and sub-themes. To address the challenge of gleaning insights from these texts, we combine open-source generative text models, retrieval-augmented generation, and prompt engineering to identify codes and themes in large volumes of text, i.e., generate a qualitative codebook. The process mimics an inductive coding process that researchers might use in traditional thematic analysis by reading text one unit of analysis at a time, considering existing codes already in the codebook, and then deciding whether or not to generate a new code based on whether the extant codebook provides adequate thematic coverage. We demonstrate this workflow using three synthetic datasets from hypothetical organizational research settings: a study of teammate feedback in teamwork settings, a study of organizational cultures of ethical behavior, and a study of employee perspectives about returning to their offices after the pandemic. We show that the GATOS workflow is able to identify themes in the text that were used to generate the original synthetic datasets.
Related papers
- Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges.
We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow.
We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z) - Systematic Task Exploration with LLMs: A Study in Citation Text Generation [63.50597360948099]
Large language models (LLMs) bring unprecedented flexibility in defining and executing complex, creative natural language generation (NLG) tasks.
We propose a three-component research framework that consists of systematic input manipulation, reference data, and output measurement.
We use this framework to explore citation text generation -- a popular scholarly NLP task that lacks consensus on the task definition and evaluation metric.
arXiv Detail & Related papers (2024-07-04T16:41:08Z) - Generative retrieval-augmented ontologic graph and multi-agent
strategies for interpretive large language model-based materials design [0.0]
Transformer neural networks show promising capabilities, in particular for uses in materials analysis, design and manufacturing.
Here we explore the use of large language models (LLMs) as a tool that can support engineering analysis of materials.
arXiv Detail & Related papers (2023-10-30T20:31:50Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Towards Understanding Machine Learning Testing in Practise [23.535630175567146]
We propose to study visualisations of Machine Learning pipelines by mining Jupyter notebooks.
First, gather general insights and trends using a qualitative study of a smaller sample of notebooks.
And then use the knowledge gained from the qualitative study to design an empirical study using a larger sample of notebooks.
arXiv Detail & Related papers (2023-05-08T18:52:26Z) - Supporting Qualitative Analysis with Large Language Models: Combining
Codebook with GPT-3 for Deductive Coding [45.5690960017762]
This study explores the use of large language models (LLMs) in supporting deductive coding.
Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning.
Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results.
arXiv Detail & Related papers (2023-04-17T04:52:43Z) - Informative Text Generation from Knowledge Triples [56.939571343797304]
We propose a novel memory augmented generator that employs a memory network to memorize the useful knowledge learned during the training.
We derive a dataset from WebNLG for our new setting and conduct extensive experiments to investigate the effectiveness of our model.
arXiv Detail & Related papers (2022-09-26T14:35:57Z) - Scholastic: Graphical Human-Al Collaboration for Inductive and
Interpretive Text Analysis [20.008165537258254]
Interpretive scholars generate knowledge from text corpora by manually sampling documents, applying codes, and refining and collating codes into categories until meaningful themes emerge.
Given a large corpus, machine learning could help scale this data sampling and analysis, but prior research shows that experts are generally concerned about algorithms potentially disrupting or driving interpretive scholarship.
We take a human-centered design approach to addressing concerns around machine-in-the-loop clustering algorithm to scaffold interpretive text analysis.
arXiv Detail & Related papers (2022-08-12T06:41:45Z) - A Framework for Neural Topic Modeling of Text Corpora [6.340447411058068]
We introduce FAME, an open-source framework enabling an efficient mechanism of extracting and incorporating textual features.
To demonstrate the effectiveness of this library, we conducted experiments on the well-known News-Group dataset.
arXiv Detail & Related papers (2021-08-19T23:32:38Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.