Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?
- URL: http://arxiv.org/abs/2502.07963v3
- Date: Mon, 05 May 2025 20:11:20 GMT
- Title: Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?
- Authors: Hye Sun Yun, Karen Y. C. Zhang, Ramez Kouzy, Iain J. Marshall, Junyi Jessy Li, Byron C. Wallace,
- Abstract summary: Publishing incentives encourage researchers to present "positive" findings, even when empirical results are equivocal.<n>Spin can influence clinician interpretation of evidence and may affect patient care decisions.<n>This study asks whether the interpretation of trial results offered by Large Language Models is similarly affected by spin.
- Score: 47.43946693104718
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present "positive" findings, even when empirical results are equivocal. Consequently, it is well-documented that authors often spin study results, especially in article abstracts. Such spin can influence clinician interpretation of evidence and may affect patient care decisions. In this study, we ask whether the interpretation of trial results offered by Large Language Models (LLMs) is similarly affected by spin. This is important since LLMs are increasingly being used to trawl through and synthesize published medical evidence. We evaluated 22 LLMs and found that they are across the board more susceptible to spin than humans. They might also propagate spin into their outputs: We find evidence, e.g., that LLMs implicitly incorporate spin into plain language summaries that they generate. We also find, however, that LLMs are generally capable of recognizing spin, and can be prompted in a way to mitigate spin's impact on LLM outputs.
Related papers
- How Much Content Do LLMs Generate That Induces Cognitive Bias in Users? [13.872175096831343]
Large language models (LLMs) are increasingly integrated into applications ranging from review summarization to medical diagnosis support.<n>We investigate when and how LLMs expose users to biased content and quantify its severity.<n>Our findings show that LLMs expose users to content that changes the sentiment of the context in 21.86% of the cases, hallucinates on post-knowledge-cutoff data questions in 57.33% of the cases, and primacy bias in 5.94% of the cases.
arXiv Detail & Related papers (2025-07-03T21:56:44Z) - Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation [44.58099275559231]
Large language models (LLMs) are increasingly integral to information retrieval (IR), powering ranking, evaluation, and AI-assisted content creation.
This paper synthesizes existing research and presents novel experiment designs that explore how LLM-based rankers and assistants influence LLM-based judges.
arXiv Detail & Related papers (2025-03-24T19:24:40Z) - Hallucinations Can Improve Large Language Models in Drug Discovery [10.573861741540853]
hallucinations in Large Language Models (LLMs) have been raised by researchers, yet their potential in areas where creativity is vital, such as drug discovery, merits exploration.
In this paper, we come up with the hypothesis that hallucinations can improve LLMs in drug discovery.
arXiv Detail & Related papers (2025-01-23T16:45:51Z) - Benchmarking LLMs and SLMs for patient reported outcomes [0.0]
This study benchmarks several SLMs against LLMs for summarizing patient-reported Q&A forms in the context of radiotherapy.
Using various metrics, we evaluate their precision and reliability.
The findings highlight both the promise and limitations of SLMs for high-stakes medical tasks, fostering more efficient and privacy-preserving AI-driven healthcare solutions.
arXiv Detail & Related papers (2024-12-20T19:01:25Z) - Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review [66.73247554182376]
Large language models (LLMs) have led to their integration into peer review.<n>The unchecked adoption of LLMs poses significant risks to the integrity of the peer review system.<n>We show that manipulating 5% of the reviews could potentially cause 12% of the papers to lose their position in the top 30% rankings.
arXiv Detail & Related papers (2024-12-02T16:55:03Z) - LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions [20.44227547555244]
Large language models (LLMs) have led many researchers to consider their usage for scientific work.
We present the first large-scale survey of 816 verified research article authors.
We find that 81% of researchers have already incorporated LLMs into different aspects of their research workflow.
arXiv Detail & Related papers (2024-10-30T04:25:23Z) - Contextual Evaluation of Large Language Models for Classifying Tropical and Infectious Diseases [0.9798965031257411]
We build on an opensource tropical and infectious diseases (TRINDs) dataset, expanding it to include demographic and semantic clinical and consumer augmentations yielding 11000+ prompts.
We evaluate LLM performance on these, comparing generalist and medical LLMs, as well as LLM outcomes to human experts.
We develop a prototype of TRINDs-LM, a research tool that provides a playground to navigate how context impacts LLM outputs for health.
arXiv Detail & Related papers (2024-09-13T21:28:54Z) - AI Meets the Classroom: When Does ChatGPT Harm Learning? [0.0]
We study how generative AI and specifically large language models (LLMs) impact learning in coding classes.
We show across three studies that LLM usage can have positive and negative effects on learning outcomes.
arXiv Detail & Related papers (2024-08-29T17:07:46Z) - Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks.
They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences.
We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z) - Delving into LLM-assisted writing in biomedical publications through excess vocabulary [4.58733012283457]
Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance.<n>We study vocabulary changes in over 15 million biomedical abstracts from 2010--2024 indexed by PubMed.<n>We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the Covid pandemic.
arXiv Detail & Related papers (2024-06-11T07:16:34Z) - Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals.
Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z) - LLM Factoscope: Uncovering LLMs' Factual Discernment through Inner States Analysis [11.712916673150245]
Large Language Models (LLMs) produce outputs that diverge from factual reality.
This phenomenon is particularly concerning in sensitive applications such as medical consultation and legal advice.
In this paper, we introduce the LLM factoscope, a novel Siamese network-based model that leverages the inner states of LLMs for factual detection.
arXiv Detail & Related papers (2023-12-27T01:44:47Z) - Don't Ignore Dual Logic Ability of LLMs while Privatizing: A
Data-Intensive Analysis in Medical Domain [19.46334739319516]
We study how the dual logic ability of LLMs is affected during the privatization process in the medical domain.
Our results indicate that incorporating general domain dual logic data into LLMs not only enhances LLMs' dual logic ability but also improves their accuracy.
arXiv Detail & Related papers (2023-09-08T08:20:46Z) - Siren's Song in the AI Ocean: A Survey on Hallucination in Large
Language Models [116.01843550398183]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks.
LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge.
arXiv Detail & Related papers (2023-09-03T16:56:48Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.