Related papers: Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding

Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding

URL: http://arxiv.org/abs/2304.10548v1
Date: Mon, 17 Apr 2023 04:52:43 GMT
Title: Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding
Authors: Ziang Xiao, Xingdi Yuan, Q. Vera Liao, Rania Abdelghani, Pierre-Yves Oudeyer
Abstract summary: This study explores the use of large language models (LLMs) in supporting deductive coding. Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning. Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results.
Score: 45.5690960017762
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Qualitative analysis of textual contents unpacks rich and valuable information by assigning labels to the data. However, this process is often labor-intensive, particularly when working with large datasets. While recent AI-based tools demonstrate utility, researchers may not have readily available AI resources and expertise, let alone be challenged by the limited generalizability of those task-specific models. In this study, we explored the use of large language models (LLMs) in supporting deductive coding, a major category of qualitative analysis where researchers use pre-determined codebooks to label the data into a fixed set of codes. Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning. Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results. We lay out challenges and opportunities in using LLMs to support qualitative coding and beyond.

Related papers

AI Coding with Few-Shot Prompting for Thematic Analysis [0.0]
This paper explores the use of large language models (LLMs) to perform coding for a thematic analysis. We utilize few-shot prompting with higher quality codes generated on semantically similar passages to enhance the quality of the codes.
arXiv Detail & Related papers (2025-04-10T03:02:15Z)
Large Language Models (LLMs) for Source Code Analysis: applications, models and datasets [3.8740749765622167]
Large language models (LLMs) and transformer-based architectures are increasingly utilized for source code analysis. This paper explores the role of LLMs for different code analysis tasks, focusing on three key aspects.
arXiv Detail & Related papers (2025-03-21T19:29:50Z)
Personalized Multimodal Large Language Models: A Survey [127.9521218125761]
Multimodal Large Language Models (MLLMs) have become increasingly important due to their state-of-the-art performance and ability to integrate multiple data modalities. This paper presents a comprehensive survey on personalized multimodal large language models, focusing on their architecture, training methods, and applications.
arXiv Detail & Related papers (2024-12-03T03:59:03Z)
Thematic Analysis with Open-Source Generative AI and Machine Learning: A New Method for Inductive Qualitative Codebook Development [0.0]
We present the Generative AI-enabled Theme Organization and Structuring (GATOS) workflow. It uses open-source machine learning techniques, natural language processing tools, and generative text models to facilitate thematic analysis. We show that the GATOS workflow is able to identify themes in the text that were used to generate the original synthetic datasets.
arXiv Detail & Related papers (2024-09-28T18:52:16Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions. We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types. We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z)
Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever [48.5585921817745]
Large Language Models (LLMs) are used to automate the knowledge tagging task. We show the strong performance of zero- and few-shot results over math questions knowledge tagging tasks. By proposing a reinforcement learning-based demonstration retriever, we successfully exploit the great potential of different-sized LLMs.
arXiv Detail & Related papers (2024-06-19T23:30:01Z)
AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data [64.69872638349922]
We present AlchemistCoder, a series of Code LLMs with enhanced code generation and generalization capabilities fine-tuned on multi-source data. We propose incorporating the data construction process into the fine-tuning data as code comprehension tasks, including instruction evolution, data filtering, and code review.
arXiv Detail & Related papers (2024-05-29T16:57:33Z)
DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries [0.0]
We evaluate OpenAI's GPT-3.5 as a "Language Data Scientist" (LDS) The model was tested on a diverse set of benchmark datasets to evaluate its performance across multiple standards.
arXiv Detail & Related papers (2024-03-29T22:59:34Z)
Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE. This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z)
Exploring the Potential of Large Language Models in Generating Code-Tracing Questions for Introductory Programming Courses [6.43363776610849]
Large language models (LLMs) can be used to generate code-tracing questions in programming courses. We present a dataset of human and LLM-generated tracing questions, serving as a valuable resource for both the education and NLP research communities.
arXiv Detail & Related papers (2023-10-23T19:35:01Z)
LLM-in-the-loop: Leveraging Large Language Model for Thematic Analysis [18.775126929754833]
Thematic analysis (TA) has been widely used for analyzing qualitative data in many disciplines and fields. Human coders develop and deepen their data interpretation and coding over multiple iterations, making TA labor-intensive and time-consuming. We propose a human-LLM collaboration framework (i.e., LLM-in-the-loop) to conduct TA with in-context learning (ICL)
arXiv Detail & Related papers (2023-10-23T17:05:59Z)
LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding [0.3149883354098941]
Large language models (LLMs) are AI tools that can perform a range of natural language processing and reasoning tasks. In this study, we explore the use of LLMs to reduce the time it takes for deductive coding while retaining the flexibility of a traditional content analysis. We find that GPT-3.5 can often perform deductive coding at levels of agreement comparable to human coders.
arXiv Detail & Related papers (2023-06-23T20:57:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.