Codebook Reduction and Saturation: Novel observations on Inductive Thematic Saturation for Large Language Models and initial coding in Thematic Analysis
- URL: http://arxiv.org/abs/2503.04859v1
- Date: Thu, 06 Mar 2025 08:52:03 GMT
- Title: Codebook Reduction and Saturation: Novel observations on Inductive Thematic Saturation for Large Language Models and initial coding in Thematic Analysis
- Authors: Stefano De Paoli, Walter Stan Mathis,
- Abstract summary: This paper reflects on the process of performing Thematic Analysis with Large Language Models (LLMs)<n>We propose a novel technique to measure Inductive Thematic Saturation (ITS)
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper reflects on the process of performing Thematic Analysis with Large Language Models (LLMs). Specifically, the paper deals with the problem of analytical saturation of initial codes, as produced by LLMs. Thematic Analysis is a well-established qualitative analysis method composed of interlinked phases. A key phase is the initial coding, where the analysts assign labels to discrete components of a dataset. Saturation is a way to measure the validity of a qualitative analysis and relates to the recurrence and repetition of initial codes. In the paper we reflect on how well LLMs achieve analytical saturation and propose also a novel technique to measure Inductive Thematic Saturation (ITS). This novel technique leverages a programming framework called DSPy. The proposed novel approach allows a precise measurement of ITS.
Related papers
- Flowco: Rethinking Data Analysis in the Age of LLMs [2.1874189959020427]
Large language models (LLMs) are now capable of generating such code for simple, routine analyses.
LLMs promise to democratize data science by enabling those with limited programming expertise to conduct data analyses.
Analysts in many real-world settings must often exercise fine-grained control over specific analysis steps.
This paper introduces Flowco, a new mixed-initiative system to address these challenges.
arXiv Detail & Related papers (2025-04-18T19:01:27Z) - Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework [61.38174427966444]
Large Language Models (LLMs) are being used more and more extensively for automated evaluation in various scenarios.<n>Previous studies have attempted to fine-tune open-source LLMs to replicate the evaluation explanations and judgments of powerful proprietary models.<n>We propose a novel evaluation framework, ARJudge, that adaptively formulates evaluation criteria and synthesizes both text-based and code-driven analyses.
arXiv Detail & Related papers (2025-02-26T06:31:45Z) - Mathematical Derivation Graphs: A Task for Summarizing Equation Dependencies in STEM Manuscripts [1.1961645395911131]
We take the initial steps toward understanding the dependency relationships between mathematical expressions in STEM articles.
Our dataset, sourced from a random sampling of the arXiv corpus, contains an analysis of 107 published STEM manuscripts.
We exhaustively evaluate analytical and NLP-based models to assess their capability to identify and extract the derivation relationships for each article.
arXiv Detail & Related papers (2024-10-26T16:52:22Z) - Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - A Thorough Examination of Decoding Methods in the Era of LLMs [72.65956436513241]
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers.
This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of large language models.
Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization.
arXiv Detail & Related papers (2024-02-10T11:14:53Z) - Expanding Horizons in HCI Research Through LLM-Driven Qualitative
Analysis [3.5253513747455303]
We introduce a new approach to qualitative analysis in HCI using Large Language Models (LLMs)
Our findings indicate that LLMs not only match the efficacy of traditional analysis methods but also offer unique insights.
arXiv Detail & Related papers (2024-01-07T12:39:31Z) - Reflections on Inductive Thematic Saturation as a potential metric for
measuring the validity of an inductive Thematic Analysis with LLMs [0.0]
The paper suggests that initial thematic saturation (ITS) could be used as a metric to assess part of the transactional validity of Thematic Analysis (TA) with Large Language Models (LLMs)
The paper presents the initial coding of two datasets of different sizes, and it reflects on how the LLM reaches some form of analytical saturation during the coding.
arXiv Detail & Related papers (2024-01-06T15:34:38Z) - Large-scale gradient-based training of Mixtures of Factor Analyzers [67.21722742907981]
This article contributes both a theoretical analysis as well as a new method for efficient high-dimensional training by gradient descent.
We prove that MFA training and inference/sampling can be performed based on precision matrices, which does not require matrix inversions after training is completed.
Besides the theoretical analysis and matrices, we apply MFA to typical image datasets such as SVHN and MNIST, and demonstrate the ability to perform sample generation and outlier detection.
arXiv Detail & Related papers (2023-08-26T06:12:33Z) - Can Large Language Models emulate an inductive Thematic Analysis of
semi-structured interviews? An exploration and provocation on the limits of
the approach and the model [0.0]
The paper presents results and reflection of an experiment done to use the model GPT 3.5-Turbo to emulate some aspects of an inductive Thematic Analysis.
The objective of the paper is not to replace human analysts in qualitative analysis but to learn if some elements of LLM data manipulation can to an extent be of support for qualitative research.
arXiv Detail & Related papers (2023-05-22T13:16:07Z) - Understanding Incremental Learning of Gradient Descent: A Fine-grained
Analysis of Matrix Sensing [74.2952487120137]
It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in machine learning models.
This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem.
arXiv Detail & Related papers (2023-01-27T02:30:51Z) - Quantum Algorithms for Data Representation and Analysis [68.754953879193]
We provide quantum procedures that speed-up the solution of eigenproblems for data representation in machine learning.
The power and practical use of these subroutines is shown through new quantum algorithms, sublinear in the input matrix's size, for principal component analysis, correspondence analysis, and latent semantic analysis.
Results show that the run-time parameters that do not depend on the input's size are reasonable and that the error on the computed model is small, allowing for competitive classification performances.
arXiv Detail & Related papers (2021-04-19T00:41:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.