A Topic Modeling Analysis of Stigma Dimensions, Social, and Related Behavioral Circumstances in Clinical Notes Among Patients with HIV
- URL: http://arxiv.org/abs/2506.09279v1
- Date: Tue, 10 Jun 2025 22:35:49 GMT
- Title: A Topic Modeling Analysis of Stigma Dimensions, Social, and Related Behavioral Circumstances in Clinical Notes Among Patients with HIV
- Authors: Ziyi Chen, Yiyang Liu, Mattia Prosperi, Krishna Vaddiparti, Robert L Cook, Jiang Bian, Yi Guo, Yonghui Wu,
- Abstract summary: We identified 9,140 cohort of people living with HIV (PLWHs) from the UF Health IDR.<n>We performed topic modeling analysis using Latent Dirichlet Allocation (LDA) to uncover stigma dimensions.<n>We conducted topic variation analysis among subgroups to examine differences across age and sex-specific demographics.
- Score: 21.478502613139582
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Objective: To characterize stigma dimensions, social, and related behavioral circumstances in people living with HIV (PLWHs) seeking care, using natural language processing methods applied to a large collection of electronic health record (EHR) clinical notes from a large integrated health system in the southeast United States. Methods: We identified 9,140 cohort of PLWHs from the UF Health IDR and performed topic modeling analysis using Latent Dirichlet Allocation (LDA) to uncover stigma dimensions, social, and related behavioral circumstances. Domain experts created a seed list of HIV-related stigma keywords, then applied a snowball strategy to iteratively review notes for additional terms until saturation was reached. To identify more target topics, we tested three keyword-based filtering strategies. Domain experts manually reviewed the detected topics using the prevalent terms and key discussion topics. Word frequency analysis was used to highlight the prevalent terms associated with each topic. In addition, we conducted topic variation analysis among subgroups to examine differences across age and sex-specific demographics. Results and Conclusion: Topic modeling on sentences containing at least one keyword uncovered a wide range of topic themes associated with HIV-related stigma, social, and related behaviors circumstances, including "Mental Health Concern and Stigma", "Social Support and Engagement", "Limited Healthcare Access and Severe Illness", "Treatment Refusal and Isolation" and so on. Topic variation analysis across age subgroups revealed differences. Extracting and understanding the HIV-related stigma dimensions, social, and related behavioral circumstances from EHR clinical notes enables scalable, time-efficient assessment, overcoming the limitations of traditional questionnaires and improving patient outcomes.
Related papers
- Contextual Embedding-based Clustering to Identify Topics for Healthcare Service Improvement [3.9726806016869936]
This study explores unsupervised methods to extract meaningful topics from 439 survey responses collected from a healthcare system in Wisconsin, USA.<n>A keyword-based filtering approach was applied to isolate complaint-related feedback using a domain-specific lexicon.<n>To improve coherence and interpretability where data are scarce and consist of short-texts, we propose kBERT.
arXiv Detail & Related papers (2025-04-18T20:38:24Z) - Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer's Disease [52.46922921214341]
Alzheimer's disease (AD) has become one of the most significant health challenges in an aging society.<n>We devised an explainable and effective feature set that leverages the visual capabilities of a large language model (LLM) and the Term Frequency-Inverse Document Frequency (TF-IDF) model.<n>Our new features can be well explained and interpreted step by step which enhance the interpretability of automatic AD screening.
arXiv Detail & Related papers (2024-11-28T05:23:22Z) - Identifying latent disease factors differently expressed in patient subgroups using group factor analysis [54.67330718129736]
We propose a novel approach to uncover subgroup-specific and subgroup-common latent factors.
The proposed approach, sparse Group Factor Analysis (GFA) with regularised horseshoe priors, was implemented with probabilistic programming.
arXiv Detail & Related papers (2024-10-10T13:12:14Z) - SemioLLM: Evaluating Large Language Models for Diagnostic Reasoning from Unstructured Clinical Narratives in Epilepsy [45.2233252981348]
Large Language Models (LLMs) have been shown to encode clinical knowledge.<n>We present SemioLLM, an evaluation framework that benchmarks 6 state-of-the-art models.<n>We show that most LLMs are able to accurately and confidently generate probabilistic predictions of seizure onset zones in the brain.
arXiv Detail & Related papers (2024-07-03T11:02:12Z) - Causal Inference with Latent Variables: Recent Advances and Future Prospectives [43.04559575298597]
Causal inference (CI) aims to infer intrinsic causal relations among variables of interest.
The lack of observation of important variables severely compromises the reliability of CI methods.
Various consequences can be incurred if these latent variables are carelessly handled.
arXiv Detail & Related papers (2024-06-20T03:15:53Z) - Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary
Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information.
A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction.
The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z) - Towards Mitigating Hallucination in Large Language Models via
Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.
This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z) - Exploring Spatial-Temporal Variations of Public Discourse on Social
Media: A Case Study on the First Wave of the Coronavirus Pandemic in Italy [0.0]
This paper proposes a methodology for exploring how linguistic behaviour on social media can be used to explore societal reactions to important events.
Our methodology consists of grounding spatial-temporal categories in tweet usage trends using time-series analysis and clustering.
We also found that temporal categories corresponded closely to policy changes during the handling of the pandemic.
arXiv Detail & Related papers (2023-06-28T08:59:50Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Pain level and pain-related behaviour classification using GRU-based
sparsely-connected RNNs [61.080598804629375]
People with chronic pain unconsciously adapt specific body movements to protect themselves from injury or additional pain.
Because there is no dedicated benchmark database to analyse this correlation, we considered one of the specific circumstances that potentially influence a person's biometrics during daily activities.
We proposed a sparsely-connected recurrent neural networks (s-RNNs) ensemble with the gated recurrent unit (GRU) that incorporates multiple autoencoders.
We conducted several experiments which indicate that the proposed method outperforms the state-of-the-art approaches in classifying both pain level and pain-related behaviour.
arXiv Detail & Related papers (2022-12-20T12:56:28Z) - Exploring Hybrid and Ensemble Models for Multiclass Prediction of Mental
Health Status on Social Media [27.799032561722893]
We report on experiments aimed at predicting six conditions (anxiety, attention deficit hyperactivity disorder, bipolar disorder, post-traumatic stress disorder, depression, and psychological stress) from Reddit social media posts.
We explore and compare the performance of hybrid and ensemble models leveraging transformer-based architectures (BERT and RoBERTa) and BiLSTM neural networks trained on within-text distributions of a diverse set of linguistic features.
In addition, we conduct feature ablation experiments to investigate which types of features are most indicative of particular mental health conditions.
arXiv Detail & Related papers (2022-12-19T20:31:47Z) - Topic Modeling on Clinical Social Work Notes for Exploring Social
Determinants of Health Factors [0.30586855806896046]
Clinical notes from social workers might provide a richer source of data on social determinants of health (SDoH)
We retrieved a diverse, deidentified corpus of 0.95 million clinical social work notes from 181,644 patients at the University of California, San Francisco.
We demonstrated that social work notes contain rich, unique, and otherwise unobtainable information on an individual's SDoH.
arXiv Detail & Related papers (2022-12-02T21:54:55Z) - Adaptive Identification of Populations with Treatment Benefit in
Clinical Trials: Machine Learning Challenges and Solutions [78.31410227443102]
We study the problem of adaptively identifying patient subpopulations that benefit from a given treatment during a confirmatory clinical trial.
We propose AdaGGI and AdaGCPI, two meta-algorithms for subpopulation construction.
arXiv Detail & Related papers (2022-08-11T14:27:49Z) - Co-occurrence of medical conditions: Exposing patterns through
probabilistic topic modeling of SNOMED codes [0.3867363075280544]
Co-occurring conditions are especially prevalent among individuals suffering from kidney disease.
This study aims to identify and characterize patterns of co-occurring medical conditions in patients employing a probabilistic framework.
arXiv Detail & Related papers (2021-09-19T19:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.