Related papers: What Patients Really Ask: Exploring the Effect of False Assumptions in Patient Information Seeking

What Patients Really Ask: Exploring the Effect of False Assumptions in Patient Information Seeking

URL: http://arxiv.org/abs/2601.15674v1
Date: Thu, 22 Jan 2026 05:56:14 GMT
Title: What Patients Really Ask: Exploring the Effect of False Assumptions in Patient Information Seeking
Authors: Raymond Xiong, Furong Jia, Lionel Wong, Monica Agrawal,
Abstract summary: Patients are increasingly using large language models (LLMs) to seek answers to their healthcare-related questions.<n>We sourced data from Google's People Also Ask feature by querying the top 200 prescribed medications in the United States.<n>A considerable portion of the collected questions contains incorrect assumptions and dangerous intentions.
Score: 5.012718216094781
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Patients are increasingly using large language models (LLMs) to seek answers to their healthcare-related questions. However, benchmarking efforts in LLMs for question answering often focus on medical exam questions, which differ significantly in style and content from the questions patients actually raise in real life. To bridge this gap, we sourced data from Google's People Also Ask feature by querying the top 200 prescribed medications in the United States, curating a dataset of medical questions people commonly ask. A considerable portion of the collected questions contains incorrect assumptions and dangerous intentions. We demonstrate that the emergence of these corrupted questions is not uniformly random and depends heavily on the degree of incorrectness in the history of questions that led to their appearance. Current LLMs that perform strongly on other benchmarks struggle to identify incorrect assumptions in everyday questions.

Related papers

MedRedFlag: Investigating how LLMs Redirect Misconceptions in Real-World Health Communication [4.557109813131144]
Real-world health questions from patients often unintentionally embed false assumptions or premises.<n>In such cases, safe medical communication typically involves redirection: addressing the implicit misconception and then responding to the underlying patient context.<n>We investigate how large language models (LLMs) react to false premises embedded within real-world health questions.
arXiv Detail & Related papers (2026-01-14T20:23:02Z)
Can LLMs Ask Good Questions? [45.54763954234726]
We evaluate questions generated by large language models (LLMs) from context.<n>We compare them to human-authored questions across six dimensions: question type, question length, context coverage, answerability, uncommonness, and required answer length.
arXiv Detail & Related papers (2025-01-07T03:21:17Z)
ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions [52.33835101586687]
We study out-of-scope questions, where the retrieved document appears semantically similar to the question but lacks the necessary information to answer it.<n>We propose a guided hallucination-based approach ELOQ to automatically generate a diverse set of out-of-scope questions from post-cutoff documents.
arXiv Detail & Related papers (2024-10-18T16:11:29Z)
I Could've Asked That: Reformulating Unanswerable Questions [89.93173151422636]
We evaluate open-source and proprietary models for reformulating unanswerable questions. GPT-4 and Llama2-7B successfully reformulate questions only 26% and 12% of the time, respectively. We publicly release the benchmark and the code to reproduce the experiments.
arXiv Detail & Related papers (2024-07-24T17:59:07Z)
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions [19.436999992810797]
We construct two new datasets: JAMA Clinical Challenge and Medbullets.<n> JAMA Clinical Challenge consists of questions based on challenging clinical cases, while Medbullets comprises simulated clinical questions.<n>We evaluate seven LLMs on the two datasets using various prompts.
arXiv Detail & Related papers (2024-02-28T05:44:41Z)
Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents [22.023543164141504]
We present Researchy Questions, a dataset of search engine queries tediously filtered to be non-factoid, decompositional'' and multi-perspective. We show that users spend a lot of effort'' on these questions in terms of signals like clicks and session length. We also show that slow thinking'' answering techniques, like decomposition into sub-questions shows benefit over answering directly.
arXiv Detail & Related papers (2024-02-27T21:27:16Z)
Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations [70.6395572287422]
Self-alignment method is capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions. We conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired.
arXiv Detail & Related papers (2024-02-23T02:24:36Z)
Medical Question Understanding and Answering with Knowledge Grounding and Semantic Self-Supervision [53.692793122749414]
We introduce a medical question understanding and answering system with knowledge grounding and semantic self-supervision. Our system is a pipeline that first summarizes a long, medical, user-written question, using a supervised summarization loss. The system first matches the summarized user question with an FAQ from a trusted medical knowledge base, and then retrieves a fixed number of relevant sentences from the corresponding answer document.
arXiv Detail & Related papers (2022-09-30T08:20:32Z)
Can large language models reason about medical questions? [7.95779617839642]
We investigate whether close- and open-source models can be applied to answer and reason about difficult real-world-based questions. We focus on three popular medical benchmarks (MedQA-USMLE, MedMCQA, and PubMedQA) and multiple prompting scenarios. Based on an expert annotation of the generated CoTs, we found that InstructGPT can often read, reason and recall expert knowledge.
arXiv Detail & Related papers (2022-07-17T11:24:44Z)
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers. Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text. We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z)
Effective Transfer Learning for Identifying Similar Questions: Matching User Questions to COVID-19 FAQs [5.512295869673147]
We show how a double fine-tuning approach of pretraining a neural network on medical question-answer pairs is a useful intermediate task for determining medical question similarity. We also describe a currently live system that uses the trained model to match user questions to COVID-related FAQ.
arXiv Detail & Related papers (2020-08-04T18:20:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.