Related papers: Word Clouds as Common Voices: LLM-Assisted Visualization of Participant-Weighted Themes in Qualitative Interviews

Word Clouds as Common Voices: LLM-Assisted Visualization of Participant-Weighted Themes in Qualitative Interviews

URL: http://arxiv.org/abs/2508.07517v1
Date: Mon, 11 Aug 2025 00:27:52 GMT
Title: Word Clouds as Common Voices: LLM-Assisted Visualization of Participant-Weighted Themes in Qualitative Interviews
Authors: Joseph T. Colonel, Baihan Lin,
Abstract summary: We introduce ThemeClouds, an open-source visualization tool that generates thematic, participant-weighted word clouds from dialogue transcripts.<n>The system prompts an LLM to identify concept-level themes across a corpus and then counts how many unique participants mention each topic.<n>Using interviews from a user study comparing five recording-device configurations, our approach surfaces more actionable device concerns than frequency clouds and topic-modeling baselines.
Score: 13.971616443394474
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Word clouds are a common way to summarize qualitative interviews, yet traditional frequency-based methods often fail in conversational contexts: they surface filler words, ignore paraphrase, and fragment semantically related ideas. This limits their usefulness in early-stage analysis, when researchers need fast, interpretable overviews of what participant actually said. We introduce ThemeClouds, an open-source visualization tool that uses large language models (LLMs) to generate thematic, participant-weighted word clouds from dialogue transcripts. The system prompts an LLM to identify concept-level themes across a corpus and then counts how many unique participants mention each topic, yielding a visualization grounded in breadth of mention rather than raw term frequency. Researchers can customize prompts and visualization parameters, providing transparency and control. Using interviews from a user study comparing five recording-device configurations (31 participants; 155 transcripts, Whisper ASR), our approach surfaces more actionable device concerns than frequency clouds and topic-modeling baselines (e.g., LDA, BERTopic). We discuss design trade-offs for integrating LLM assistance into qualitative workflows, implications for interpretability and researcher agency, and opportunities for interactive analyses such as per-condition contrasts (``diff clouds'').

Related papers

Leveraging Large Language Models to Identify Conversation Threads in Collaborative Learning [14.255468784386977]
threading is the way discourse talk naturally organizes into interwoven topical strands that evolve over time.<n>This paper contributes a systematic guidebook for identifying threads in synchronous multi-party transcripts.<n>We then test how threading influences performance on downstream coding of conversational analysis frameworks.
arXiv Detail & Related papers (2025-10-26T21:25:23Z)
Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models [3.8673630752805446]
The aim is to investigate the extent to which the linguistic context alone can inform the detection of mentions.<n>We adapt a pretrained large language model (LLM) to perform a relatively course-grained annotation of mention spans in unfolding conversations.<n>Our findings indicate that even when using a moderately sized LLM, relatively small datasets, and parameter-efficient fine-tuning, a text-only approach can be effective.
arXiv Detail & Related papers (2025-06-26T14:14:20Z)
Multimodal Conversation Structure Understanding [12.29827265137757]
Large language models' ability to understand fine-grained conversational structure remains underexplored.<n>We present a human annotated dataset of 4,398 annotations for speakers and reply-to relationship, 5,755 addressees, and 3,142 side-participants.<n>We evaluate popular audio-visual LLMs and vision-language models on our dataset, and our experimental results suggest that multimodal conversational structure understanding remains challenging.
arXiv Detail & Related papers (2025-05-23T06:41:54Z)
A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations [112.81207927088117]
PersonaConvBench is a benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs)<n>We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements.
arXiv Detail & Related papers (2025-05-20T09:13:22Z)
Vision-Speech Models: Teaching Speech Models to Converse about Images [67.62394024470528]
We introduce MoshiVis, augmenting a recent dialogue speech LLM, Moshi, with visual inputs through lightweight adaptation modules.<n>An additional dynamic gating mechanism enables the model to more easily switch between the visual inputs and unrelated conversation topics.<n>We evaluate the model on downstream visual understanding tasks with both audio and text prompts, and report qualitative samples of interactions with MoshiVis.
arXiv Detail & Related papers (2025-03-19T18:40:45Z)
Generative Context-aware Fine-tuning of Self-supervised Speech Models [54.389711404209415]
We study the use of generative large language models (LLM) generated context information. We propose an approach to distill the generated information during fine-tuning of self-supervised speech models. We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
arXiv Detail & Related papers (2023-12-15T15:46:02Z)
Contextual Object Detection with Multimodal Large Language Models [66.15566719178327]
We introduce a novel research problem of contextual object detection. Three representative scenarios are investigated, including the language cloze test, visual captioning, and question answering. We present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts.
arXiv Detail & Related papers (2023-05-29T17:50:33Z)
Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs [59.74002011562726]
We propose a novel linguistic cue-based chain-of-thoughts (textitCue-CoT) to provide a more personalized and engaging response. We build a benchmark with in-depth dialogue questions, consisting of 6 datasets in both Chinese and English. Empirical results demonstrate our proposed textitCue-CoT method outperforms standard prompting methods in terms of both textithelpfulness and textitacceptability on all datasets.
arXiv Detail & Related papers (2023-05-19T16:27:43Z)
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History [30.20353302347147]
We propose a novel model architecture that learns dialog context to jointly predict the intent, dialog act, speaker role, and emotion for the spoken utterance. Our experiments show that our joint model achieves similar results to task-specific classifiers.
arXiv Detail & Related papers (2023-05-01T16:26:18Z)
Visually-augmented pretrained language models for NLP tasks without images [77.74849855049523]
Existing solutions often rely on explicit images for visual knowledge augmentation. We propose a novel textbfVisually-textbfAugmented fine-tuning approach. Our approach can consistently improve the performance of BERT, RoBERTa, BART, and T5 at different scales.
arXiv Detail & Related papers (2022-12-15T16:13:25Z)
Findings on Conversation Disentanglement [28.874162427052905]
We build a learning model that learns utterance-to-utterance and utterance-to-thread classification. Experiments on the Ubuntu IRC dataset show that this approach has the potential to outperform the conventional greedy approach.
arXiv Detail & Related papers (2021-12-10T05:54:48Z)
Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization [72.54873655114844]
Text summarization is one of the most challenging and interesting problems in NLP. This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations. Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment.
arXiv Detail & Related papers (2020-10-04T20:12:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.