Supporting Qualitative Analysis with Large Language Models: Combining
Codebook with GPT-3 for Deductive Coding
- URL: http://arxiv.org/abs/2304.10548v1
- Date: Mon, 17 Apr 2023 04:52:43 GMT
- Title: Supporting Qualitative Analysis with Large Language Models: Combining
Codebook with GPT-3 for Deductive Coding
- Authors: Ziang Xiao, Xingdi Yuan, Q. Vera Liao, Rania Abdelghani, Pierre-Yves
Oudeyer
- Abstract summary: This study explores the use of large language models (LLMs) in supporting deductive coding.
Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning.
Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results.
- Score: 45.5690960017762
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Qualitative analysis of textual contents unpacks rich and valuable
information by assigning labels to the data. However, this process is often
labor-intensive, particularly when working with large datasets. While recent
AI-based tools demonstrate utility, researchers may not have readily available
AI resources and expertise, let alone be challenged by the limited
generalizability of those task-specific models. In this study, we explored the
use of large language models (LLMs) in supporting deductive coding, a major
category of qualitative analysis where researchers use pre-determined codebooks
to label the data into a fixed set of codes. Instead of training task-specific
models, a pre-trained LLM could be used directly for various tasks without
fine-tuning through prompt learning. Using a curiosity-driven questions coding
task as a case study, we found, by combining GPT-3 with expert-drafted
codebooks, our proposed approach achieved fair to substantial agreements with
expert-coded results. We lay out challenges and opportunities in using LLMs to
support qualitative coding and beyond.
Related papers
- Thematic Analysis with Open-Source Generative AI and Machine Learning: A New Method for Inductive Qualitative Codebook Development [0.0]
We present the Generative AI-enabled Theme Organization and Structuring (GATOS) workflow.
It uses open-source machine learning techniques, natural language processing tools, and generative text models to facilitate thematic analysis.
We show that the GATOS workflow is able to identify themes in the text that were used to generate the original synthetic datasets.
arXiv Detail & Related papers (2024-09-28T18:52:16Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever [48.5585921817745]
Large Language Models (LLMs) are used to automate the knowledge tagging task.
We show the strong performance of zero- and few-shot results over math questions knowledge tagging tasks.
By proposing a reinforcement learning-based demonstration retriever, we successfully exploit the great potential of different-sized LLMs.
arXiv Detail & Related papers (2024-06-19T23:30:01Z) - AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data [64.69872638349922]
We present AlchemistCoder, a series of Code LLMs with enhanced code generation and generalization capabilities fine-tuned on multi-source data.
We propose incorporating the data construction process into the fine-tuning data as code comprehension tasks, including instruction evolution, data filtering, and code review.
arXiv Detail & Related papers (2024-05-29T16:57:33Z) - DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries [0.0]
We evaluate OpenAI's GPT-3.5 as a "Language Data Scientist" (LDS)
The model was tested on a diverse set of benchmark datasets to evaluate its performance across multiple standards.
arXiv Detail & Related papers (2024-03-29T22:59:34Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Exploring the Potential of Large Language Models in Generating
Code-Tracing Questions for Introductory Programming Courses [6.43363776610849]
Large language models (LLMs) can be used to generate code-tracing questions in programming courses.
We present a dataset of human and LLM-generated tracing questions, serving as a valuable resource for both the education and NLP research communities.
arXiv Detail & Related papers (2023-10-23T19:35:01Z) - LLM-in-the-loop: Leveraging Large Language Model for Thematic Analysis [18.775126929754833]
Thematic analysis (TA) has been widely used for analyzing qualitative data in many disciplines and fields.
Human coders develop and deepen their data interpretation and coding over multiple iterations, making TA labor-intensive and time-consuming.
We propose a human-LLM collaboration framework (i.e., LLM-in-the-loop) to conduct TA with in-context learning (ICL)
arXiv Detail & Related papers (2023-10-23T17:05:59Z) - LLM-Assisted Content Analysis: Using Large Language Models to Support
Deductive Coding [0.3149883354098941]
Large language models (LLMs) are AI tools that can perform a range of natural language processing and reasoning tasks.
In this study, we explore the use of LLMs to reduce the time it takes for deductive coding while retaining the flexibility of a traditional content analysis.
We find that GPT-3.5 can often perform deductive coding at levels of agreement comparable to human coders.
arXiv Detail & Related papers (2023-06-23T20:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.