Related papers: Reflections on Inductive Thematic Saturation as a potential metric for measuring the validity of an inductive Thematic Analysis with LLMs

Reflections on Inductive Thematic Saturation as a potential metric for measuring the validity of an inductive Thematic Analysis with LLMs

URL: http://arxiv.org/abs/2401.03239v1
Date: Sat, 6 Jan 2024 15:34:38 GMT
Title: Reflections on Inductive Thematic Saturation as a potential metric for measuring the validity of an inductive Thematic Analysis with LLMs
Authors: Stefano De Paoli and Walter Stan Mathis
Abstract summary: The paper suggests that initial thematic saturation (ITS) could be used as a metric to assess part of the transactional validity of Thematic Analysis (TA) with Large Language Models (LLMs) The paper presents the initial coding of two datasets of different sizes, and it reflects on how the LLM reaches some form of analytical saturation during the coding.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper presents a set of reflections on saturation and the use of Large Language Models (LLMs) for performing Thematic Analysis (TA). The paper suggests that initial thematic saturation (ITS) could be used as a metric to assess part of the transactional validity of TA with LLM, focusing on the initial coding. The paper presents the initial coding of two datasets of different sizes, and it reflects on how the LLM reaches some form of analytical saturation during the coding. The procedure proposed in this work leads to the creation of two codebooks, one comprising the total cumulative initial codes and the other the total unique codes. The paper proposes a metric to synthetically measure ITS using a simple mathematical calculation employing the ratio between slopes of cumulative codes and unique codes. The paper contributes to the initial body of work exploring how to perform qualitative analysis with LLMs.

Related papers

Codebook Reduction and Saturation: Novel observations on Inductive Thematic Saturation for Large Language Models and initial coding in Thematic Analysis [0.0]
This paper reflects on the process of performing Thematic Analysis with Large Language Models (LLMs) We propose a novel technique to measure Inductive Thematic Saturation (ITS)
arXiv Detail & Related papers (2025-03-06T08:52:03Z)
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning [27.562284768743694]
Large language models (LLMs) can prove mathematical theorems formally by generating proof steps within a proof system. We introduce a neuro-symbolic tactic generator that synergizes the mathematical intuition learned by LLMs with domain-specific insights encoded by symbolic methods. We evaluate our framework on 161 challenging inequalities from multiple mathematics competitions.
arXiv Detail & Related papers (2025-02-19T15:54:21Z)
AlphaIntegrator: Transformer Action Search for Symbolic Integration Proofs [3.618534280726541]
We present the first correct-by-construction learning-based system for step-by-step mathematical integration. The key idea is to learn a policy, represented by a GPT transformer model, which guides the search for the right mathematical integration rule. We introduce a symbolic engine with axiomatically correct actions on mathematical expressions, as well as the first dataset for step-by-step integration.
arXiv Detail & Related papers (2024-10-03T16:50:30Z)
On the Design and Analysis of LLM-Based Algorithms [74.7126776018275]
Large language models (LLMs) are used as sub-routines in algorithms. LLMs have achieved remarkable empirical success. Our proposed framework holds promise for advancing LLM-based algorithms.
arXiv Detail & Related papers (2024-07-20T07:39:07Z)
Case2Code: Learning Inductive Reasoning with Synthetic Data [105.89741089673575]
We propose a textbfCase2Code task by exploiting the expressiveness and correctness of programs. We first evaluate representative LLMs on the synthesized Case2Code task and demonstrate that the Case-to-code induction is challenging for LLMs. Experimental results show that such induction training benefits not only in distribution Case2Code performance but also enhances various coding abilities of trained LLMs.
arXiv Detail & Related papers (2024-07-17T11:35:00Z)
Source Code Summarization in the Era of Large Language Models [23.715005053430957]
Large language models (LLMs) have led to a great boost in the performance of code-related tasks. In this paper, we undertake a systematic and comprehensive study on code summarization in the era of LLMs.
arXiv Detail & Related papers (2024-07-09T05:48:42Z)
Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance. We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features. In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z)
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting [78.48355455324688]
We propose a novel zero-shot synthetic code detector based on the similarity between the code and its rewritten variants. Our results demonstrate a notable enhancement over existing synthetic content detectors designed for general texts.
arXiv Detail & Related papers (2024-05-25T08:57:28Z)
RepEval: Effective Text Evaluation with LLM Representation [55.26340302485898]
RepEval is a metric that leverages the projection of Large Language Models (LLMs) representations for evaluation. Our work underscores the richness of information regarding text quality embedded within LLM representations, offering insights for the development of new metrics.
arXiv Detail & Related papers (2024-04-30T13:50:55Z)
A Thorough Examination of Decoding Methods in the Era of LLMs [72.65956436513241]
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of large language models. Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization.
arXiv Detail & Related papers (2024-02-10T11:14:53Z)
LLM-in-the-loop: Leveraging Large Language Model for Thematic Analysis [18.775126929754833]
Thematic analysis (TA) has been widely used for analyzing qualitative data in many disciplines and fields. Human coders develop and deepen their data interpretation and coding over multiple iterations, making TA labor-intensive and time-consuming. We propose a human-LLM collaboration framework (i.e., LLM-in-the-loop) to conduct TA with in-context learning (ICL)
arXiv Detail & Related papers (2023-10-23T17:05:59Z)
Large Language Models for Code Analysis: Do LLMs Really Do Their Job? [13.48555476110316]
Large language models (LLMs) have demonstrated significant potential in the realm of natural language understanding and programming code processing tasks. This paper offers a comprehensive evaluation of LLMs' capabilities in performing code analysis tasks.
arXiv Detail & Related papers (2023-10-18T22:02:43Z)
Exploiting Code Symmetries for Learning Program Semantics [21.145969030761716]
We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations. We develop a novel variant of self-attention that is provably equivariant to code symmetries from the permutation group defined over the program dependence graph. Our results suggest that code LLMs that encode the code structural prior via the code symmetry group generalize better and faster.
arXiv Detail & Related papers (2023-08-07T05:40:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.