Scalable Qualitative Coding with LLMs: Chain-of-Thought Reasoning
Matches Human Performance in Some Hermeneutic Tasks
- URL: http://arxiv.org/abs/2401.15170v2
- Date: Mon, 12 Feb 2024 23:04:10 GMT
- Title: Scalable Qualitative Coding with LLMs: Chain-of-Thought Reasoning
Matches Human Performance in Some Hermeneutic Tasks
- Authors: Zackary Okun Dunivin
- Abstract summary: We show that GPT-4 is capable of human-equivalent interpretations, whereas GPT-3.5 is not.
Our results indicate that for certain codebooks, state-of-the-art LLMs are already adept at large-scale content analysis.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Qualitative coding, or content analysis, extracts meaning from text to
discern quantitative patterns across a corpus of texts. Recently, advances in
the interpretive abilities of large language models (LLMs) offer potential for
automating the coding process (applying category labels to texts), thereby
enabling human researchers to concentrate on more creative research aspects,
while delegating these interpretive tasks to AI. Our case study comprises a set
of socio-historical codes on dense, paragraph-long passages representative of a
humanistic study. We show that GPT-4 is capable of human-equivalent
interpretations, whereas GPT-3.5 is not. Compared to our human-derived gold
standard, GPT-4 delivers excellent intercoder reliability (Cohen's $\kappa \geq
0.79$) for 3 of 9 codes, and substantial reliability ($\kappa \geq 0.6$) for 8
of 9 codes. In contrast, GPT-3.5 greatly underperforms for all codes
($mean(\kappa) = 0.34$; $max(\kappa) = 0.55$). Importantly, we find that coding
fidelity improves considerably when the LLM is prompted to give rationale
justifying its coding decisions (chain-of-thought reasoning). We present these
and other findings along with a set of best practices for adapting traditional
codebooks for LLMs. Our results indicate that for certain codebooks,
state-of-the-art LLMs are already adept at large-scale content analysis.
Furthermore, they suggest the next generation of models will likely render AI
coding a viable option for a majority of codebooks.
Related papers
- Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models [28.295926947968574]
Large language models (LLMs) have brought a paradigm shift to the field of code generation.
We empirically analyze the differences in coding style between the code generated by Code LLMs and the code written by human developers.
arXiv Detail & Related papers (2024-06-29T14:56:11Z) - Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting [78.48355455324688]
We propose a novel zero-shot synthetic code detector based on the similarity between the code and its rewritten variants.
Our results demonstrate a notable enhancement over existing synthetic content detectors designed for general texts.
arXiv Detail & Related papers (2024-05-25T08:57:28Z) - Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code.
We find that code prompting exhibits a high-performance boost for multiple LLMs.
Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z) - Towards Human-Level Text Coding with LLMs: The Case of Fatherhood Roles in Public Policy Documents [19.65846717628022]
Large language models (LLMs) promise automation with better results and less programming.
In this study, we evaluate LLMs on three original coding tasks involving typical complexities encountered in political science settings.
We find that the best prompting strategy consists of providing the LLMs with a detailed codebook, as the one provided to human coders.
arXiv Detail & Related papers (2023-11-20T15:34:45Z) - Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation.
We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z) - LLM-Assisted Content Analysis: Using Large Language Models to Support
Deductive Coding [0.3149883354098941]
Large language models (LLMs) are AI tools that can perform a range of natural language processing and reasoning tasks.
In this study, we explore the use of LLMs to reduce the time it takes for deductive coding while retaining the flexibility of a traditional content analysis.
We find that GPT-3.5 can often perform deductive coding at levels of agreement comparable to human coders.
arXiv Detail & Related papers (2023-06-23T20:57:32Z) - Towards Coding Social Science Datasets with Language Models [4.280286557747323]
Researchers often rely on humans to code (label, annotate, etc.) large sets of texts.
Recent advances in a specific kind of artificial intelligence tool - language models (LMs) - provide a solution.
We find that GPT-3 can match the performance of typical human coders and offers benefits over other machine learning methods of coding text.
arXiv Detail & Related papers (2023-06-03T19:11:34Z) - Supporting Qualitative Analysis with Large Language Models: Combining
Codebook with GPT-3 for Deductive Coding [45.5690960017762]
This study explores the use of large language models (LLMs) in supporting deductive coding.
Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning.
Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results.
arXiv Detail & Related papers (2023-04-17T04:52:43Z) - Stealing the Decoding Algorithms of Language Models [56.369946232765656]
A key component of generating text from modern language models (LM) is the selection and tuning of decoding algorithms.
In this work, we show, for the first time, that an adversary with typical API access to an LM can steal the type and hyper parameters of its decoding algorithms.
Our attack is effective against popular LMs used in text generation APIs, including GPT-2, GPT-3 and GPT-Neo.
arXiv Detail & Related papers (2023-03-08T17:15:58Z) - Contrastive Decoding: Open-ended Text Generation as Optimization [153.35961722855686]
We propose contrastive decoding (CD), a reliable decoding approach.
It is inspired by the fact that the failures of larger LMs are even more prevalent in smaller LMs.
CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone.
arXiv Detail & Related papers (2022-10-27T00:58:21Z) - Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent.
It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics.
We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.