Contextual Embedding-based Clustering to Identify Topics for Healthcare Service Improvement
- URL: http://arxiv.org/abs/2504.14068v1
- Date: Fri, 18 Apr 2025 20:38:24 GMT
- Title: Contextual Embedding-based Clustering to Identify Topics for Healthcare Service Improvement
- Authors: K M Sajjadul Islam, Ravi Teja Karri, Srujan Vegesna, Jiawei Wu, Praveen Madiraju,
- Abstract summary: This study explores unsupervised methods to extract meaningful topics from 439 survey responses collected from a healthcare system in Wisconsin, USA.<n>A keyword-based filtering approach was applied to isolate complaint-related feedback using a domain-specific lexicon.<n>To improve coherence and interpretability where data are scarce and consist of short-texts, we propose kBERT.
- Score: 3.9726806016869936
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Understanding patient feedback is crucial for improving healthcare services, yet analyzing unlabeled short-text feedback presents significant challenges due to limited data and domain-specific nuances. Traditional supervised learning approaches require extensive labeled datasets, making unsupervised methods more viable for uncovering meaningful insights from patient feedback. This study explores unsupervised methods to extract meaningful topics from 439 survey responses collected from a healthcare system in Wisconsin, USA. A keyword-based filtering approach was applied to isolate complaint-related feedback using a domain-specific lexicon. To delve deeper and analyze dominant topics in feedback, we explored traditional topic modeling methods, including Latent Dirichlet Allocation (LDA) and Gibbs Sampling Dirichlet Multinomial Mixture (GSDMM), alongside BERTopic, an advanced neural embedding-based clustering approach. To improve coherence and interpretability where data are scarce and consist of short-texts, we propose kBERT, an integration of BERT embeddings with k-means clustering. Model performance was assessed using coherence scores (Cv ) for topic interpretability and average Inverted Rank-Biased Overlap (IRBOavg) for topic diversity. Results indicate that kBERT achieves the highest coherence (Cv = 0.53) and distinct topic separation (IRBOavg = 1.00), outperforming all other models in short-text healthcare feedback analysis. Our findings emphasize the importance of embedding-based techniques for topic identification and highlight the need for context-aware models in healthcare analytics.
Related papers
- LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - ConDiSR: Contrastive Disentanglement and Style Regularization for Single Domain Generalization [42.810247034149214]
Medical data often exhibits distribution shifts, which cause test-time performance degradation for deep learning models trained using standard pipelines.
This study highlights the importance and challenges of exploring Single Domain Generalization frameworks in the context of the classification task.
arXiv Detail & Related papers (2024-03-14T13:50:44Z) - SHAMSUL: Systematic Holistic Analysis to investigate Medical
Significance Utilizing Local interpretability methods in deep learning for
chest radiography pathology prediction [1.0138723409205497]
The study delves into the application of four well-established interpretability methods: Local Interpretable Model-agnostic Explanations (LIME), Shapley Additive exPlanations (SHAP), Gradient-weighted Class Activation Mapping (Grad-CAM) and Layer-wise Relevance Propagation (LRP)
Our analysis encompasses both single-label and multi-label predictions, providing a comprehensive and unbiased assessment through quantitative and qualitative investigations, which are compared against human expert annotation.
arXiv Detail & Related papers (2023-07-16T11:10:35Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - Retinal Vessel Segmentation via a Multi-resolution Contextual Network
and Adversarial Learning [4.776465250559035]
We propose a Multi-resolution Contextual Network (MRC-Net) to learn contextual dependencies between semantically different features.
We have evaluated our method on three benchmark datasets, including DRIVE, STARE, and CHASE.
arXiv Detail & Related papers (2023-04-25T14:27:34Z) - Simple and Scalable Algorithms for Cluster-Aware Precision Medicine [0.0]
We propose a simple and scalable approach to joint clustering and embedding.
This novel, cluster-aware embedding approach overcomes the complexity and limitations of current joint embedding and clustering methods.
Our approach does not require the user to choose the desired number of clusters, but instead yields interpretable dendrograms of hierarchically clustered embeddings.
arXiv Detail & Related papers (2022-11-29T19:27:26Z) - Towards Interpretable Summary Evaluation via Allocation of Contextual
Embeddings to Reference Text Topics [1.5749416770494706]
The multifaceted interpretable summary evaluation method (MISEM) is based on allocation of a summary's contextual token embeddings to semantic topics identified in the reference text.
MISEM achieves a promising.404 Pearson correlation with human judgment on the TAC'08 dataset.
arXiv Detail & Related papers (2022-10-25T17:09:08Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.