Pub-Guard-LLM: Detecting Fraudulent Biomedical Articles with Reliable Explanations
- URL: http://arxiv.org/abs/2502.15429v4
- Date: Tue, 08 Apr 2025 10:27:59 GMT
- Title: Pub-Guard-LLM: Detecting Fraudulent Biomedical Articles with Reliable Explanations
- Authors: Lihu Chen, Shuojie Fu, Gabriel Freedman, Cemre Zor, Guy Martin, James Kinross, Uddhav Vaghela, Ovidiu Serban, Francesca Toni,
- Abstract summary: Pub-Guard-LLM is a large language model-based system tailored to fraud detection of biomedical scientific articles.<n>Pub-Guard-LLM consistently surpasses the performance of various baselines.<n>By enhancing both detection performance and explainability in scientific fraud detection, Pub-Guard-LLM contributes to safeguarding research integrity with a novel, effective, open-source tool.
- Score: 11.082285990214595
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: A significant and growing number of published scientific articles is found to involve fraudulent practices, posing a serious threat to the credibility and safety of research in fields such as medicine. We propose Pub-Guard-LLM, the first large language model-based system tailored to fraud detection of biomedical scientific articles. We provide three application modes for deploying Pub-Guard-LLM: vanilla reasoning, retrieval-augmented generation, and multi-agent debate. Each mode allows for textual explanations of predictions. To assess the performance of our system, we introduce an open-source benchmark, PubMed Retraction, comprising over 11K real-world biomedical articles, including metadata and retraction labels. We show that, across all modes, Pub-Guard-LLM consistently surpasses the performance of various baselines and provides more reliable explanations, namely explanations which are deemed more relevant and coherent than those generated by the baselines when evaluated by multiple assessment methods. By enhancing both detection performance and explainability in scientific fraud detection, Pub-Guard-LLM contributes to safeguarding research integrity with a novel, effective, open-source tool.
Related papers
- Uncertainty-aware abstention in medical diagnosis based on medical texts [87.88110503208016]
This study addresses the critical issue of reliability for AI-assisted medical diagnosis.
We focus on the selection prediction approach that allows the diagnosis system to abstain from providing the decision if it is not confident in the diagnosis.
We introduce HUQ-2, a new state-of-the-art method for enhancing reliability in selective prediction tasks.
arXiv Detail & Related papers (2025-02-25T10:15:21Z) - Survey on AI-Generated Media Detection: From Non-MLLM to MLLM [51.91311158085973]
Methods for detecting AI-generated media have evolved rapidly.<n>General-purpose detectors based on MLLMs integrate authenticity verification, explainability, and localization capabilities.<n>Ethical and security considerations have emerged as critical global concerns.
arXiv Detail & Related papers (2025-02-07T12:18:20Z) - Causal Representation Learning from Multimodal Biomedical Observations [57.00712157758845]
We develop flexible identification conditions for multimodal data and principled methods to facilitate the understanding of biomedical datasets.
Key theoretical contribution is the structural sparsity of causal connections between modalities.
Results on a real-world human phenotype dataset are consistent with established biomedical research.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization [23.173826980480936]
Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness.
This paper presents a benchmark of six advanced abstractive summarization methods across three diverse datasets using five standardized metrics.
We propose uMedSum, a modular hybrid summarization framework that introduces novel approaches for sequential confabulation removal followed by key missing information addition.
arXiv Detail & Related papers (2024-08-22T03:08:49Z) - Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation [15.495976478018264]
Large language models (LLMs) have emerged as a promising tool to revolutionize knowledge interaction.
We construct a dataset of background-hypothesis pairs from biomedical literature, partitioned into training, seen, and unseen test sets.
We assess the hypothesis generation capabilities of top-tier instructed models in zero-shot, few-shot, and fine-tuning settings.
arXiv Detail & Related papers (2024-07-12T02:55:13Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - A survey of recent methods for addressing AI fairness and bias in
biomedicine [48.46929081146017]
Artificial intelligence systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender.
We surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV)
We performed a literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords.
We reviewed other potential methods from the general domain that could be applied to biomedicine to address bias and improve fairness.
arXiv Detail & Related papers (2024-02-13T06:38:46Z) - MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z) - Readability Controllable Biomedical Document Summarization [17.166794984161964]
We introduce a new task of readability controllable summarization for biomedical documents.
It aims to recognise users' readability demands and generate summaries that better suit their needs.
arXiv Detail & Related papers (2022-10-10T14:03:20Z) - Multimodal Machine Learning in Precision Health [10.068890037410316]
This review was conducted to summarize this field and identify topics ripe for future research.
We used a combination of content analysis and literature searches to establish search strings and databases of PubMed, Google Scholar, and IEEEXplore from 2011 to 2021.
The most common form of information fusion was early fusion. Notably, there was an improvement in predictive performance performing heterogeneous data fusion.
arXiv Detail & Related papers (2022-04-10T21:56:07Z) - Deep forecasting of translational impact in medical research [1.8130872753848115]
We develop a suite of representational and discriminative mathematical models of multi-scale publication data.
We show that citations are only moderately predictive of translational impact as judged by inclusion in patents, guidelines, or policy documents.
We argue that content-based models of impact are superior in performance to conventional, citation-based measures.
arXiv Detail & Related papers (2021-10-17T19:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.