Automatic deductive coding in discourse analysis: an application of large language models in learning analytics
- URL: http://arxiv.org/abs/2410.01240v1
- Date: Wed, 2 Oct 2024 05:04:06 GMT
- Title: Automatic deductive coding in discourse analysis: an application of large language models in learning analytics
- Authors: Lishan Zhang, Han Wu, Xiaoshan Huang, Tengfei Duan, Hanxiang Du,
- Abstract summary: The emergence of large language models such as GPT has opened a new avenue for automatic deductive coding.
We employed three different classification methods driven by different artificial intelligence technologies.
We found that GPT with prompt engineering outperformed the other two methods on both datasets with limited number of training samples.
- Score: 5.606202114848633
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deductive coding is a common discourse analysis method widely used by learning science and learning analytics researchers for understanding teaching and learning interactions. It often requires researchers to manually label all discourses to be analyzed according to a theoretically guided coding scheme, which is time-consuming and labor-intensive. The emergence of large language models such as GPT has opened a new avenue for automatic deductive coding to overcome the limitations of traditional deductive coding. To evaluate the usefulness of large language models in automatic deductive coding, we employed three different classification methods driven by different artificial intelligence technologies, including the traditional text classification method with text feature engineering, BERT-like pretrained language model and GPT-like pretrained large language model (LLM). We applied these methods to two different datasets and explored the potential of GPT and prompt engineering in automatic deductive coding. By analyzing and comparing the accuracy and Kappa values of these three classification methods, we found that GPT with prompt engineering outperformed the other two methods on both datasets with limited number of training samples. By providing detailed prompt structures, the reported work demonstrated how large language models can be used in the implementation of automatic deductive coding.
Related papers
- Thematic Analysis with Open-Source Generative AI and Machine Learning: A New Method for Inductive Qualitative Codebook Development [0.0]
We present the Generative AI-enabled Theme Organization and Structuring (GATOS) workflow.
It uses open-source machine learning techniques, natural language processing tools, and generative text models to facilitate thematic analysis.
We show that the GATOS workflow is able to identify themes in the text that were used to generate the original synthetic datasets.
arXiv Detail & Related papers (2024-09-28T18:52:16Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Automatic Generation of Behavioral Test Cases For Natural Language Processing Using Clustering and Prompting [6.938766764201549]
This paper introduces an automated approach to develop test cases by exploiting the power of large language models and statistical techniques.
We analyze the behavioral test profiles across four different classification algorithms and discuss the limitations and strengths of those models.
arXiv Detail & Related papers (2024-07-31T21:12:21Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Large Language Models as Analogical Reasoners [155.9617224350088]
Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks.
We introduce a new prompting approach, analogical prompting, designed to automatically guide the reasoning process of large language models.
arXiv Detail & Related papers (2023-10-03T00:57:26Z) - LLM-Assisted Content Analysis: Using Large Language Models to Support
Deductive Coding [0.3149883354098941]
Large language models (LLMs) are AI tools that can perform a range of natural language processing and reasoning tasks.
In this study, we explore the use of LLMs to reduce the time it takes for deductive coding while retaining the flexibility of a traditional content analysis.
We find that GPT-3.5 can often perform deductive coding at levels of agreement comparable to human coders.
arXiv Detail & Related papers (2023-06-23T20:57:32Z) - Supporting Qualitative Analysis with Large Language Models: Combining
Codebook with GPT-3 for Deductive Coding [45.5690960017762]
This study explores the use of large language models (LLMs) in supporting deductive coding.
Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning.
Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results.
arXiv Detail & Related papers (2023-04-17T04:52:43Z) - Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods.
We conducted three types of experiments -- monolingual, multilingual, and cross-lingual.
The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z) - Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models.
We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.