Related papers: Large Language Models' Varying Accuracy in Recognizing Risk-Promoting and Health-Supporting Sentiments in Public Health Discourse: The Cases of HPV Vaccination and Heated Tobacco Products

Large Language Models' Varying Accuracy in Recognizing Risk-Promoting and Health-Supporting Sentiments in Public Health Discourse: The Cases of HPV Vaccination and Heated Tobacco Products

URL: http://arxiv.org/abs/2507.04364v1
Date: Sun, 06 Jul 2025 11:57:02 GMT
Title: Large Language Models' Varying Accuracy in Recognizing Risk-Promoting and Health-Supporting Sentiments in Public Health Discourse: The Cases of HPV Vaccination and Heated Tobacco Products
Authors: Soojong Kim, Kwanho Kim, Hye Min Kim,
Abstract summary: Large Language Models (LLMs) have gained attention as a powerful technology, yet their accuracy and feasibility in capturing different opinions on health issues are largely unexplored.<n>This research examines how accurate the three prominent LLMs are in detecting risk-promoting versus health-supporting sentiments.<n>Specifically, models often show higher accuracy for risk-promoting sentiment on Facebook, whereas health-supporting messages on Twitter are more accurately detected.
Score: 2.0618817976970103
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning methods are increasingly applied to analyze health-related public discourse based on large-scale data, but questions remain regarding their ability to accurately detect different types of health sentiments. Especially, Large Language Models (LLMs) have gained attention as a powerful technology, yet their accuracy and feasibility in capturing different opinions and perspectives on health issues are largely unexplored. Thus, this research examines how accurate the three prominent LLMs (GPT, Gemini, and LLAMA) are in detecting risk-promoting versus health-supporting sentiments across two critical public health topics: Human Papillomavirus (HPV) vaccination and heated tobacco products (HTPs). Drawing on data from Facebook and Twitter, we curated multiple sets of messages supporting or opposing recommended health behaviors, supplemented with human annotations as the gold standard for sentiment classification. The findings indicate that all three LLMs generally demonstrate substantial accuracy in classifying risk-promoting and health-supporting sentiments, although notable discrepancies emerge by platform, health issue, and model type. Specifically, models often show higher accuracy for risk-promoting sentiment on Facebook, whereas health-supporting messages on Twitter are more accurately detected. An additional analysis also shows the challenges LLMs face in reliably detecting neutral messages. These results highlight the importance of carefully selecting and validating language models for public health analyses, particularly given potential biases in training data that may lead LLMs to overestimate or underestimate the prevalence of certain perspectives.

Related papers

VaxGuard: A Multi-Generator, Multi-Type, and Multi-Role Dataset for Detecting LLM-Generated Vaccine Misinformation [8.08298631918046]
Existing benchmarks often overlook vaccine-related misinformation and the diverse roles of misinformation spreaders.<n>This paper introduces VaxGuard, a novel dataset designed to address these challenges.
arXiv Detail & Related papers (2025-03-12T06:43:25Z)
Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment [108.55277188617035]
Large language models (LLMs) have been widely adopted in various downstream task domains, but their ability to directly recall and apply factual medical knowledge remains under-explored.<n>Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs' inherent medical knowledge from their reasoning capabilities.<n>We introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs' one-hop factual medical knowledge.
arXiv Detail & Related papers (2025-02-20T05:27:51Z)
Do LLMs Provide Consistent Answers to Health-Related Questions across Languages? [14.87110905165928]
We examine the consistency of responses provided by Large Language Models (LLMs) to health-related questions across English, German, Turkish, and Chinese.<n>We reveal significant inconsistencies in responses that could spread healthcare misinformation.<n>Our findings emphasize the need for improved cross-lingual alignment to ensure accurate and equitable healthcare information.
arXiv Detail & Related papers (2025-01-24T18:51:26Z)
Optimizing Social Media Annotation of HPV Vaccine Skepticism and Misinformation Using Large Language Models: An Experimental Evaluation of In-Context Learning and Fine-Tuning Stance Detection Across Multiple Models [10.2201516537852]
We experimentally determine optimal strategies for scaling up social media content annotation for stance detection on HPV vaccine-related tweets.<n>In general, in-context learning outperforms fine-tuning in stance detection for HPV vaccine social media content.
arXiv Detail & Related papers (2024-11-22T04:19:32Z)
Uncertainty Estimation of Large Language Models in Medical Question Answering [60.72223137560633]
Large Language Models (LLMs) show promise for natural language generation in healthcare, but risk hallucinating factually incorrect information. We benchmark popular uncertainty estimation (UE) methods with different model sizes on medical question-answering datasets. Our results show that current approaches generally perform poorly in this domain, highlighting the challenge of UE for medical applications.
arXiv Detail & Related papers (2024-07-11T16:51:33Z)
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models [92.04812189642418]
We introduce CARES and aim to evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness.
arXiv Detail & Related papers (2024-06-10T04:07:09Z)
Hierarchical Multi-Label Classification of Online Vaccine Concerns [8.271202196208]
Vaccine concerns are an ever-evolving target, and can shift quickly as seen during the COVID-19 pandemic. We explore the task of detecting vaccine concerns in online discourse using large language models (LLMs) in a zero-shot setting without the need for expensive training datasets.
arXiv Detail & Related papers (2024-02-01T20:56:07Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Dynamics and triggers of misinformation on vaccines [0.552480439325792]
We analyze 6 years of Italian vaccine debate across diverse social media platforms (Facebook, Instagram, Twitter, YouTube) We first use the symbolic transfer entropy analysis of news production time-series to determine which category of sources, questionable or reliable, causally drives the agenda on vaccines. We then leverage deep learning models capable to accurately classify vaccine-related content based on the conveyed stance and discussed topic.
arXiv Detail & Related papers (2022-07-25T15:35:48Z)
Assessing the Severity of Health States based on Social Media Posts [62.52087340582502]
We propose a multiview learning framework that models both the textual content as well as contextual-information to assess the severity of the user's health state. The diverse NLU views demonstrate its effectiveness on both the tasks and as well as on the individual disease to assess a user's health.
arXiv Detail & Related papers (2020-09-21T03:45:14Z)
COVI White Paper [67.04578448931741]
Contact tracing is an essential tool to change the course of the Covid-19 pandemic. We present an overview of the rationale, design, ethical considerations and privacy strategy of COVI,' a Covid-19 public peer-to-peer contact tracing and risk awareness mobile application developed in Canada.
arXiv Detail & Related papers (2020-05-18T07:40:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.