"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets
- URL: http://arxiv.org/abs/2506.21532v1
- Date: Thu, 26 Jun 2025 17:52:18 GMT
- Title: "What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets
- Authors: Akshay Paruchuri, Maryam Aziz, Rohit Vartak, Ayman Ali, Best Uchehara, Xin Liu, Ishan Chatterjee, Monica Agrawal,
- Abstract summary: HealthChat-11K is a curated dataset of 11K real-world conversations composed of 25K user messages.<n>Our analysis reveals insights into the nature of how and why users seek health information.
- Score: 6.459488580102546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: People are increasingly seeking healthcare information from large language models (LLMs) via interactive chatbots, yet the nature and inherent risks of these conversations remain largely unexplored. In this paper, we filter large-scale conversational AI datasets to achieve HealthChat-11K, a curated dataset of 11K real-world conversations composed of 25K user messages. We use HealthChat-11K and a clinician-driven taxonomy for how users interact with LLMs when seeking healthcare information in order to systematically study user interactions across 21 distinct health specialties. Our analysis reveals insights into the nature of how and why users seek health information, such as common interactions, instances of incomplete context, affective behaviors, and interactions (e.g., leading questions) that can induce sycophancy, underscoring the need for improvements in the healthcare support capabilities of LLMs deployed as conversational AI. Code and artifacts to retrieve our analyses and combine them into a curated dataset can be found here: https://github.com/yahskapar/HealthChat
Related papers
- Towards Better Health Conversations: The Benefits of Context-seeking [17.329382113242556]
We present insights on how people interact with large language models (LLMs) for their own health questions.<n>Studies revealed the importance of context-seeking in conversational AIs to elicit specific details a person may not volunteer or know to share.<n>We developed a "Wayfinding AI" to proactively solicit context.
arXiv Detail & Related papers (2025-09-14T01:08:42Z) - From Chat Logs to Collective Insights: Aggregative Question Answering [28.700113669309314]
We introduce Aggregative Question Answering, a novel task requiring models to reason explicitly over thousands of user-chatbot interactions to answer aggregative queries.<n>To enable research in this direction, we construct a benchmark, WildChat-AQA, comprising 6,027 aggregative questions derived from 182,330 real-world conversations.
arXiv Detail & Related papers (2025-05-29T17:59:55Z) - A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI [70.06771291117965]
We introduce Biomedica, an open-source dataset derived from the PubMed Central Open Access subset.<n>Biomedica contains over 6 million scientific articles and 24 million image-text pairs.<n>We provide scalable streaming and search APIs through a web server, facilitating seamless integration with AI systems.
arXiv Detail & Related papers (2025-03-26T05:56:46Z) - Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health [1.8772687384996551]
Large language models (LLMs) have opened up new opportunities for transforming patient engagement in healthcare through conversational AI.
We showcase the power of LLMs in handling unstructured conversational data through four case studies.
arXiv Detail & Related papers (2024-06-19T16:02:04Z) - Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries [92.1651731484397]
We present Quriosity, a collection of 13.5K naturally occurring questions from three diverse sources.<n>Our analysis reveals a significant presence of causal questions (up to 42%) in the dataset.
arXiv Detail & Related papers (2024-05-30T17:55:28Z) - Conversational Health Agents: A Personalized LLM-Powered Agent Framework [1.4597673707346281]
Conversational Health Agents (CHAs) are interactive systems that provide healthcare services, such as assistance and diagnosis.
We propose openCHA, an open-source framework to empower conversational agents to generate a personalized response for users' healthcare queries.
openCHA includes an orchestrator to plan and execute actions for gathering information from external sources.
arXiv Detail & Related papers (2023-10-03T18:54:10Z) - AutoConv: Automatically Generating Information-seeking Conversations
with Large Language Models [74.10293412011455]
We propose AutoConv for synthetic conversation generation.
Specifically, we formulate the conversation generation problem as a language modeling task.
We finetune an LLM with a few human conversations to capture the characteristics of the information-seeking process.
arXiv Detail & Related papers (2023-08-12T08:52:40Z) - MedNgage: A Dataset for Understanding Engagement in Patient-Nurse
Conversations [4.847266237348932]
Patients who effectively manage their symptoms often demonstrate higher levels of engagement in conversations and interventions with healthcare practitioners.
It is crucial for AI systems to understand the engagement in natural conversations between patients and practitioners to better contribute toward patient care.
We present a novel dataset (MedNgage) which consists of patient-nurse conversations about cancer symptom management.
arXiv Detail & Related papers (2023-05-31T16:06:07Z) - GenSpectrum Chat: Data Exploration in Public Health Using Large Language
Models [2.9823962001574187]
The COVID-19 pandemic highlighted the importance of making epidemiological data easily accessible and explorable.
We developed the "GenSpectrum Chat" which uses GPT-4 as the underlying large language model (LLM) to explore SARS-CoV-2 genomic sequencing data.
arXiv Detail & Related papers (2023-05-23T08:43:43Z) - ChatGPT versus Traditional Question Answering for Knowledge Graphs:
Current Status and Future Directions Towards Knowledge Graph Chatbots [7.2676028986202]
Conversational AI and Question-Answering systems (QASs) for knowledge graphs (KGs) are both emerging research areas.
QASs retrieve the most recent information from a KG by understanding and translating the natural language question into a formal query supported by the database engine.
Our framework compares two representative conversational models, ChatGPT and Galactica, against KGQAN, the current state-of-the-art QAS.
arXiv Detail & Related papers (2023-02-08T13:03:27Z) - PLACES: Prompting Language Models for Social Conversation Synthesis [103.94325597273316]
We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting.
We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
arXiv Detail & Related papers (2023-02-07T05:48:16Z) - Training Conversational Agents with Generative Conversational Networks [74.9941330874663]
We use Generative Conversational Networks to automatically generate data and train social conversational agents.
We evaluate our approach on TopicalChat with automatic metrics and human evaluators, showing that with 10% of seed data it performs close to the baseline that uses 100% of the data.
arXiv Detail & Related papers (2021-10-15T21:46:39Z) - MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware
Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG.
We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation.
Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z) - Conversations with Search Engines: SERP-based Conversational Response
Generation [77.1381159789032]
We create a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines.
We also develop a state-of-the-art pipeline for conversations with search engines, the Conversations with Search Engines (CaSE) using this dataset.
CaSE enhances the state-of-the-art by introducing a supporting token identification module and aprior-aware pointer generator.
arXiv Detail & Related papers (2020-04-29T13:07:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.