Conversations with AI Chatbots Increase Short-Term Vaccine Intentions But Do Not Outperform Standard Public Health Messaging
- URL: http://arxiv.org/abs/2504.20519v2
- Date: Wed, 30 Apr 2025 03:22:51 GMT
- Title: Conversations with AI Chatbots Increase Short-Term Vaccine Intentions But Do Not Outperform Standard Public Health Messaging
- Authors: Neil K. R. Sehgal, Sunny Rai, Manuel Tonneau, Anish K. Agarwal, Joseph Cappella, Melanie Kornides, Lyle Ungar, Alison Buttenheim, Sharath Chandra Guntuku,
- Abstract summary: Large language model (LLM) based chatbots show promise in persuasive communication.<n>This randomized controlled trial involved 930 vaccine-hesitant parents.<n>Discussions significantly increased self-reported vaccination intent (by 7.1-10.3 points on a 100-point scale) compared to no message.
- Score: 5.816741004594914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language model (LLM) based chatbots show promise in persuasive communication, but existing studies often rely on weak controls or focus on belief change rather than behavioral intentions or outcomes. This pre-registered multi-country (US, Canada, UK) randomized controlled trial involving 930 vaccine-hesitant parents evaluated brief (three-minute) multi-turn conversations with LLM-based chatbots against standard public health messaging approaches for increasing human papillomavirus (HPV) vaccine intentions for their children. Participants were randomly assigned to: (1) a weak control (no message), (2) a strong control reflecting the standard of care (reading official public health materials), or (3 and 4) one of two chatbot conditions. One chatbot was prompted to deliver short, conversational responses, while the other used the model's default output style (longer with bullet points). While chatbot interactions significantly increased self-reported vaccination intent (by 7.1-10.3 points on a 100-point scale) compared to no message, they did not outperform standard public health materials, with the conversational chatbot performing significantly worse. Additionally, while the short-term effects of chatbot interactions faded during a 15-day follow-up, the effects of public health material persisted relative to no message. These findings suggest that while LLMs can effectively shift vaccination intentions in the short-term, their incremental value over existing public health communications is questionable, offering a more tempered view of their persuasive capabilities and highlighting the importance of integrating AI-driven tools alongside, rather than replacing, current public health strategies.
Related papers
- Effect of Static vs. Conversational AI-Generated Messages on Colorectal Cancer Screening Intent: a Randomized Controlled Trial [5.429833789548265]
Large language model (LLM) chatbots show increasing promise in persuasive communication.<n>We enrolled 915 U.S. adults (ages 45-75) who had never completed colorectal cancer (CRC) screening.<n>Both AI interventions significantly increased stool test intentions by over 12 points (12.9-13.8/100), compared to a 7.5 gain for expert materials.
arXiv Detail & Related papers (2025-07-10T22:46:43Z) - Development and Evaluation of HopeBot: an LLM-based chatbot for structured and interactive PHQ-9 depression screening [48.355615275247786]
HopeBot administers the Patient Health Questionnaire-9 (PHQ-9) using retrieval-augmented generation and real-time clarification.<n>In a within-subject study, 132 adults in the United Kingdom and China completed both self-administered and chatbots versions.<n>Overall, 87.1% expressed willingness to reuse or recommend HopeBot.
arXiv Detail & Related papers (2025-07-08T13:41:22Z) - Working with Large Language Models to Enhance Messaging Effectiveness for Vaccine Confidence [0.276240219662896]
Vaccine hesitancy and misinformation are significant barriers to achieving widespread vaccination coverage.<n>This paper explores the potential of ChatGPT-augmented messaging to promote confidence in vaccination uptake.
arXiv Detail & Related papers (2025-04-14T04:06:46Z) - Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards [93.16294577018482]
Arena, the most popular benchmark of this type, ranks models by asking users to select the better response between two randomly selected models.
We show that an attacker can alter the leaderboard (to promote their favorite model or demote competitors) at the cost of roughly a thousand votes.
Our attack consists of two steps: first, we show how an attacker can determine which model was used to generate a given reply with more than $95%$ accuracy; and then, the attacker can use this information to consistently vote against a target model.
arXiv Detail & Related papers (2025-01-13T17:12:38Z) - Empathetic Response in Audio-Visual Conversations Using Emotion Preference Optimization and MambaCompressor [44.499778745131046]
Our study introduces a dual approach: firstly, we employ Emotional Preference Optimization (EPO) to train chatbots.<n>This training enables the model to discern fine distinctions between correct and counter-emotional responses.<n> Secondly, we introduce MambaCompressor to effectively compress and manage extensive conversation histories.<n>Our comprehensive experiments across multiple datasets demonstrate that our model significantly outperforms existing models in generating empathetic responses and managing lengthy dialogues.
arXiv Detail & Related papers (2024-12-23T13:44:51Z) - Prompt Engineering a Schizophrenia Chatbot: Utilizing a Multi-Agent Approach for Enhanced Compliance with Prompt Instructions [0.0699049312989311]
Patients with schizophrenia often present with cognitive impairments that may hinder their ability to learn about their condition.
While Large Language Models (LLMs) have the potential to make topical mental health information more accessible and engaging, their black-box nature raises concerns about ethics and safety.
arXiv Detail & Related papers (2024-10-10T09:49:24Z) - Self-Directed Turing Test for Large Language Models [56.64615470513102]
The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations.
Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time.
This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format.
arXiv Detail & Related papers (2024-08-19T09:57:28Z) - Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval [7.925754291635035]
Persuasion plays a pivotal role in a wide range of applications from health intervention to the promotion of social good.
Persuasive chatbots employed responsibly for social good can be an enabler of positive individual and social change.
We propose PersuaBot, a zero-shot chatbots based on Large Language Models (LLMs) that is factual and more persuasive by leveraging many more nuanced strategies.
Our experiments on simulated and human conversations show that our zero-shot approach is more persuasive than prior work, while achieving factual accuracy surpassing state-of-the-art knowledge-oriented chatbots.
arXiv Detail & Related papers (2024-07-04T02:28:21Z) - How Reliable AI Chatbots are for Disease Prediction from Patient Complaints? [0.0]
This study examines the reliability of AI chatbots, specifically GPT 4.0, Claude 3 Opus, and Gemini Ultra 1.0, in predicting diseases from patient complaints in the emergency department.
Results suggest that GPT 4.0 achieves high accuracy with increased few-shot data, while Gemini Ultra 1.0 performs well with fewer examples, and Claude 3 Opus maintains consistent performance.
arXiv Detail & Related papers (2024-05-21T22:00:13Z) - Accuracy of a Large Language Model in Distinguishing Anti- And Pro-vaccination Messages on Social Media: The Case of Human Papillomavirus Vaccination [1.8434042562191815]
This research assesses the accuracy of ChatGPT for sentiment analysis to discern different stances toward HPV vaccination.
Messages related to HPV vaccination were collected from social media supporting different message formats: Facebook (long format) and Twitter (short format)
Accuracy was measured for each message as the level of concurrence between human and machine decisions, ranging between 0 and 1.
arXiv Detail & Related papers (2024-04-10T04:35:54Z) - Measuring and Controlling Instruction (In)Stability in Language Model Dialogs [72.38330196290119]
System-prompting is a tool for customizing language-model chatbots, enabling them to follow a specific instruction.
We propose a benchmark to test the assumption, evaluating instruction stability via self-chats.
We reveal a significant instruction drift within eight rounds of conversations.
We propose a lightweight method called split-softmax, which compares favorably against two strong baselines.
arXiv Detail & Related papers (2024-02-13T20:10:29Z) - Development and Evaluation of Three Chatbots for Postpartum Mood and
Anxiety Disorders [31.018188794627378]
We develop three chatbots to provide context-specific empathetic support to postpartum caregivers.
We present and evaluate the performance of our chatbots using both machine-based metrics and human-based questionnaires.
We conclude by discussing practical benefits of rule-based vs. generative models for supporting individuals with mental health challenges.
arXiv Detail & Related papers (2023-08-14T18:52:03Z) - Towards Healthy AI: Large Language Models Need Therapists Too [41.86344997530743]
We define Healthy AI to be safe, trustworthy and ethical.
We present the SafeguardGPT framework that uses psychotherapy to correct for these harmful behaviors.
arXiv Detail & Related papers (2023-04-02T00:39:12Z) - Doctors vs. Nurses: Understanding the Great Divide in Vaccine Hesitancy
among Healthcare Workers [64.1526243118151]
We find that doctors are overall more positive toward the COVID-19 vaccines.
Doctors are more concerned with the effectiveness of the vaccines over newer variants.
Nurses pay more attention to the potential side effects on children.
arXiv Detail & Related papers (2022-09-11T14:22:16Z) - You Don't Know My Favorite Color: Preventing Dialogue Representations
from Revealing Speakers' Private Personas [44.82330540456883]
We show that speakers' personas can be inferred through a simple neural network with high accuracy.
We conduct extensive experiments to demonstrate that our proposed defense objectives can greatly reduce the attack accuracy from 37.6% to 0.5%.
arXiv Detail & Related papers (2022-04-26T09:36:18Z) - Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn
Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions.
Our framework included a guiding robot and an interlocutor model that plays the role of humans.
We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z) - Assessing the Severity of Health States based on Social Media Posts [62.52087340582502]
We propose a multiview learning framework that models both the textual content as well as contextual-information to assess the severity of the user's health state.
The diverse NLU views demonstrate its effectiveness on both the tasks and as well as on the individual disease to assess a user's health.
arXiv Detail & Related papers (2020-09-21T03:45:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.