Related papers: A Reproducible Framework for Neural Topic Modeling in Focus Group Analysis

A Reproducible Framework for Neural Topic Modeling in Focus Group Analysis

URL: http://arxiv.org/abs/2511.18843v2
Date: Tue, 02 Dec 2025 16:48:14 GMT
Title: A Reproducible Framework for Neural Topic Modeling in Focus Group Analysis
Authors: Heger Arfaoui, Mohammed Iheb Hergli, Beya Benzina, Slimane BenMiled,
Abstract summary: We present a systematic framework for applying BERTopic to focus group transcripts using data from ten focus groups in Tunisia.<n> bootstrap stability analysis, performance metrics, and comparison with LDA baseline.<n>Findings demonstrate that transformer-based topic modeling can extract interpretable themes from small focus group transcript corpora.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Focus group discussions generate rich qualitative data but their analysis traditionally relies on labor-intensive manual coding that limits scalability and reproducibility. We present a systematic framework for applying BERTopic to focus group transcripts using data from ten focus groups exploring HPV vaccine perceptions in Tunisia (1,075 utterances). We conducted comprehensive hyperparameter exploration across 27 configurations, evaluating each through bootstrap stability analysis, performance metrics, and comparison with LDA baseline. Bootstrap analysis revealed that stability metrics (NMI and ARI) exhibited strong disagreement (r = -0.691) and showed divergent relationships with coherence, demonstrating that stability is multifaceted rather than monolithic. Our multi-criteria selection framework yielded a 7-topic model achieving 18\% higher coherence than optimized LDA (0.573 vs. 0.486) with interpretable topics validated through independent human evaluation (ICC = 0.700, weighted Cohen's kappa = 0.678). These findings demonstrate that transformer-based topic modeling can extract interpretable themes from small focus group transcript corpora when systematically configured and validated, while revealing that quality metrics capture distinct, sometimes conflicting constructs requiring multi-criteria evaluation. We provide complete documentation and code to support reproducibility.

Related papers

Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements [78.87065404966002]
Existing benchmarks predominantly curate questions at the question level.<n>We propose Encyclo-K, a statement-based benchmark that rethinks benchmark construction from the ground up.
arXiv Detail & Related papers (2025-12-31T13:55:54Z)
AutoMalDesc: Large-Scale Script Analysis for Cyber Threat Research [81.04845910798387]
Generating natural language explanations for threat detections remains an open problem in cybersecurity research.<n>We present AutoMalDesc, an automated static analysis summarization framework that operates independently at scale.<n>We publish our complete dataset of more than 100K script samples, including annotated seed (0.9K) datasets, along with our methodology and evaluation framework.
arXiv Detail & Related papers (2025-11-17T13:05:25Z)
SAGE: A Realistic Benchmark for Semantic Understanding [9.688555356614044]
We introduce SAGE (Semantic Alignment & Generalization Evaluation), a rigorous benchmark designed to assess both embedding models and similarity metrics.<n>Our comprehensive evaluation of 9 embedding models and classical metrics reveals significant performance gaps.<n>OpenAI's text-embedding-3-small achieves the highest clustering performance (0.483) but demonstrates extreme brittleness with the lowest robustness score (0.011)
arXiv Detail & Related papers (2025-09-25T15:27:15Z)
The Digital Sous Chef -- A Comparative Study on Fine-Tuning Language Models for Recipe Generation [2.497854684676663]
We present a comprehensive study contrasting a fine-tuned GPT-2 large (774M) model against the GPT-2 small (124M) model and traditional LSTM/RNN baselines on the 5-cuisine corpus from RecipeDB.<n>Our key contribution is a targeted tokenization strategy that augments the vocabulary with 23 common fraction tokens and custom structural markers.
arXiv Detail & Related papers (2025-08-20T13:53:13Z)
RoHOI: Robustness Benchmark for Human-Object Interaction Detection [84.78366452133514]
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support.<n>We introduce the first benchmark for HOI detection, evaluating model resilience under diverse challenges.<n>Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric.
arXiv Detail & Related papers (2025-07-12T01:58:04Z)
An Uncertainty-Aware Dynamic Decision Framework for Progressive Multi-Omics Integration in Classification Tasks [6.736267874971369]
We propose an uncertainty-aware, multi-view dynamic decision framework for omics data classification.<n>We employ a fusion strategy based on Dempster-Shafer theory to integrate heterogeneous modalities.<n>In three datasets, over 50% of cases achieved accurate classification using a single omics modality.
arXiv Detail & Related papers (2025-06-20T13:44:14Z)
Contextual Embedding-based Clustering to Identify Topics for Healthcare Service Improvement [3.9726806016869936]
This study explores unsupervised methods to extract meaningful topics from 439 survey responses collected from a healthcare system in Wisconsin, USA.<n>A keyword-based filtering approach was applied to isolate complaint-related feedback using a domain-specific lexicon.<n>To improve coherence and interpretability where data are scarce and consist of short-texts, we propose kBERT.
arXiv Detail & Related papers (2025-04-18T20:38:24Z)
$C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction [80.57232374640911]
We propose a model-agnostic strategy called the Mask-And-Recover (MAR)<n>MAR integrates both inter- and intra-modality contextual correlations to enable global inference within extraction modules.<n>To better target challenging parts within each sample, we introduce a Fine-grained Confidence Score (FCS) model.
arXiv Detail & Related papers (2025-04-01T13:01:30Z)
Conformal uncertainty quantification to evaluate predictive fairness of foundation AI model for skin lesion classes across patient demographics [8.692647930497936]
We use conformal analysis to quantify the predictive uncertainty of a vision transformer based foundation model.<n>We show how this can be used as a fairness metric to evaluate the robustness of the feature embeddings of the foundation model.
arXiv Detail & Related papers (2025-03-31T08:06:00Z)
MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning? [51.85759493254735]
MindGYM is a structured and scalable framework for question synthesis.<n>It infuses high-level reasoning objectives to shape the model's synthesis behavior.<n>It composes more complex multi-hop questions based on QA seeds for deeper reasoning.
arXiv Detail & Related papers (2025-03-12T16:03:03Z)
A Statistical Framework for Ranking LLM-Based Chatbots [57.59268154690763]
We propose a statistical framework that incorporates key advancements to address specific challenges in pairwise comparison analysis.<n>First, we introduce a factored tie model that enhances the ability to handle groupings of human-judged comparisons.<n>Second, we extend the framework to model covariance tiers between competitors, enabling deeper insights into performance relationships.<n>Third, we resolve optimization challenges arising from parameter non-uniqueness by introducing novel constraints.
arXiv Detail & Related papers (2024-12-24T12:54:19Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs. Existing benchmarks are often limited in scope, focusing mainly on object hallucinations. We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z)
Comparison of Topic Modelling Approaches in the Banking Context [0.0]
This study presents the use of Kernel Principal Component Analysis ( KernelPCA) and K-means Clustering in the BERTopic architecture. We have prepared a new dataset using tweets from customers of Nigerian banks and we use this to compare the topic modelling approaches. Our findings showed KernelPCA and K-means in the BERTopic architecture-produced coherent topics with a coherence score of 0.8463.
arXiv Detail & Related papers (2024-02-05T16:43:53Z)
Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
An Operational Perspective to Fairness Interventions: Where and How to Intervene [9.833760837977222]
We present a holistic framework for evaluating and contextualizing fairness interventions. We demonstrate our framework with a case study on predictive parity. We find predictive parity is difficult to achieve without using group data.
arXiv Detail & Related papers (2023-02-03T07:04:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.