Related papers: Decide less, communicate more: On the construct validity of end-to-end fact-checking in medicine

Decide less, communicate more: On the construct validity of end-to-end fact-checking in medicine

URL: http://arxiv.org/abs/2506.20876v2
Date: Sat, 28 Jun 2025 06:11:10 GMT
Title: Decide less, communicate more: On the construct validity of end-to-end fact-checking in medicine
Authors: Sebastian Joseph, Lily Chen, Barry Wei, Michael Mackert, Iain J. Marshall, Paul Pu Liang, Ramez Kouzy, Byron C. Wallace, Junyi Jessy Li,
Abstract summary: We show how experts verify real claims from social media by synthesizing medical evidence.<n>Difficulties connecting claims in the wild to scientific evidence in the form of clinical trials.<n>We argue that fact-checking should be approached and evaluated as an interactive communication problem.
Score: 59.604255567812714
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Technological progress has led to concrete advancements in tasks that were regarded as challenging, such as automatic fact-checking. Interest in adopting these systems for public health and medicine has grown due to the high-stakes nature of medical decisions and challenges in critically appraising a vast and diverse medical literature. Evidence-based medicine connects to every individual, and yet the nature of it is highly technical, rendering the medical literacy of majority users inadequate to sufficiently navigate the domain. Such problems with medical communication ripens the ground for end-to-end fact-checking agents: check a claim against current medical literature and return with an evidence-backed verdict. And yet, such systems remain largely unused. To understand this, we present the first study examining how clinical experts verify real claims from social media by synthesizing medical evidence. In searching for this upper-bound, we reveal fundamental challenges in end-to-end fact-checking when applied to medicine: Difficulties connecting claims in the wild to scientific evidence in the form of clinical trials; ambiguities in underspecified claims mixed with mismatched intentions; and inherently subjective veracity labels. We argue that fact-checking should be approached and evaluated as an interactive communication problem, rather than an end-to-end process.

Related papers

MedScore: Factuality Evaluation of Free-Form Medical Answers [54.722181966548895]
We propose MedScore, a new approach to decomposing medical answers into condition-aware valid facts.<n>Our method extracts up to three times more valid facts than existing methods.
arXiv Detail & Related papers (2025-05-24T01:23:09Z)
Medical Hallucinations in Foundation Models and Their Impact on Healthcare [53.97060824532454]
Foundation Models that are capable of processing and generating multi-modal data have transformed AI's role in medicine.<n>We define medical hallucination as any instance in which a model generates misleading medical content.<n>Our results reveal that inference techniques such as Chain-of-Thought (CoT) and Search Augmented Generation can effectively reduce hallucination rates.<n>These findings underscore the ethical and practical imperative for robust detection and mitigation strategies.
arXiv Detail & Related papers (2025-02-26T02:30:44Z)
Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment [108.55277188617035]
Large language models (LLMs) have been widely adopted in various downstream task domains, but their ability to directly recall and apply factual medical knowledge remains under-explored.<n>Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs' inherent medical knowledge from their reasoning capabilities.<n>We introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs' one-hop factual medical knowledge.
arXiv Detail & Related papers (2025-02-20T05:27:51Z)
Retrieval-augmented systems can be dangerous medical communicators [21.371504193281226]
Patients have long sought health information online, and increasingly, they are turning to generative AI to answer their health-related queries.<n>Retrieval-augmented generation and citation grounding have been widely promoted as methods to reduce hallucinations and improve the accuracy of AI-generated responses.<n>This paper argues that even when these methods produce literally accurate content drawn from source documents sans hallucinations, they can still be highly misleading.
arXiv Detail & Related papers (2025-02-18T01:57:02Z)
Iterative Tree Analysis for Medical Critics [5.617649111108429]
Iterative Tree Analysis (ITA) is designed to extract implicit claims from long medical texts and verify each claim through an iterative and adaptive tree-like reasoning process.<n>Our experiments demonstrate that ITA significantly outperforms previous methods in detecting factual inaccuracies in complex medical text verification tasks by 10%.
arXiv Detail & Related papers (2025-01-18T03:13:26Z)
Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence [0.12277343096128711]
We study three core tasks: identifying medical claims, extracting medical vocabulary from these claims, and retrieving evidence relevant to those identified medical claims. We propose a novel system that can generate synthetic medical claims to aid each of these core tasks.
arXiv Detail & Related papers (2024-05-18T07:50:43Z)
Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z)
Comparing Knowledge Sources for Open-Domain Scientific Claim Verification [6.726255259929497]
We show that PubMed works better with specialized biomedical claims, while Wikipedia is more suited for everyday health concerns. We discuss the results, outline frequent retrieval patterns and challenges, and provide promising future directions.
arXiv Detail & Related papers (2024-02-05T09:57:15Z)
HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking [5.065947993017158]
HealthFC is a dataset of 750 health-related claims in German and English labeled for veracity by medical experts. We provide an analysis of the dataset, highlighting its characteristics and challenges. We show that the dataset is a challenging test bed with a high potential for future use.
arXiv Detail & Related papers (2023-09-15T16:05:48Z)
Don't Ignore Dual Logic Ability of LLMs while Privatizing: A Data-Intensive Analysis in Medical Domain [19.46334739319516]
We study how the dual logic ability of LLMs is affected during the privatization process in the medical domain. Our results indicate that incorporating general domain dual logic data into LLMs not only enhances LLMs' dual logic ability but also improves their accuracy.
arXiv Detail & Related papers (2023-09-08T08:20:46Z)
Semi-Supervised Variational Reasoning for Medical Dialogue Generation [70.838542865384]
Two key characteristics are relevant for medical dialogue generation: patient states and physician actions. We propose an end-to-end variational reasoning approach to medical dialogue generation. A physician policy network composed of an action-classifier and two reasoning detectors is proposed for augmented reasoning ability.
arXiv Detail & Related papers (2021-05-13T04:14:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.