Iterative Tree Analysis for Medical Critics
- URL: http://arxiv.org/abs/2501.10642v1
- Date: Sat, 18 Jan 2025 03:13:26 GMT
- Title: Iterative Tree Analysis for Medical Critics
- Authors: Zenan Huang, Mingwei Li, Zheng Zhou, Youxin Jiang,
- Abstract summary: Iterative Tree Analysis (ITA) is designed to extract implicit claims from long medical texts and verify each claim through an iterative and adaptive tree-like reasoning process.
Our experiments demonstrate that ITA significantly outperforms previous methods in detecting factual inaccuracies in complex medical text verification tasks by 10%.
- Score: 5.617649111108429
- License:
- Abstract: Large Language Models (LLMs) have been widely adopted across various domains, yet their application in the medical field poses unique challenges, particularly concerning the generation of hallucinations. Hallucinations in open-ended long medical text manifest as misleading critical claims, which are difficult to verify due to two reasons. First, critical claims are often deeply entangled within the text and cannot be extracted based solely on surface-level presentation. Second, verifying these claims is challenging because surface-level token-based retrieval often lacks precise or specific evidence, leaving the claims unverifiable without deeper mechanism-based analysis. In this paper, we introduce a novel method termed Iterative Tree Analysis (ITA) for medical critics. ITA is designed to extract implicit claims from long medical texts and verify each claim through an iterative and adaptive tree-like reasoning process. This process involves a combination of top-down task decomposition and bottom-up evidence consolidation, enabling precise verification of complex medical claims through detailed mechanism-level reasoning. Our extensive experiments demonstrate that ITA significantly outperforms previous methods in detecting factual inaccuracies in complex medical text verification tasks by 10%. Additionally, we will release a comprehensive test set to the public, aiming to foster further advancements in research within this domain.
Related papers
- Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment [108.55277188617035]
Large language models (LLMs) have been widely adopted in various downstream task domains, but their ability to directly recall and apply factual medical knowledge remains under-explored.
Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs' inherent medical knowledge from their reasoning capabilities.
We introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs' one-hop factual medical knowledge.
arXiv Detail & Related papers (2025-02-20T05:27:51Z) - MedCoT: Medical Chain of Thought via Hierarchical Expert [48.91966620985221]
This paper presents MedCoT, a novel hierarchical expert verification reasoning chain method.
It is designed to enhance interpretability and accuracy in biomedical imaging inquiries.
Experimental evaluations on four standard Med-VQA datasets demonstrate that MedCoT surpasses existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-18T11:14:02Z) - A BERT-Based Summarization approach for depression detection [1.7363112470483526]
Depression is a globally prevalent mental disorder with potentially severe repercussions if not addressed.
Machine learning and artificial intelligence can autonomously detect depression indicators from diverse data sources.
Our study proposes text summarization as a preprocessing technique to diminish the length and intricacies of input texts.
arXiv Detail & Related papers (2024-09-13T02:14:34Z) - Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models [1.03590082373586]
This paper conducts a scoping study of existing techniques for mitigating hallucinations in knowledge-based task in general and especially for medical domains.
Key methods covered in the paper include Retrieval-Augmented Generation (RAG)-based techniques, iterative feedback loops, supervised fine-tuning, and prompt engineering.
These techniques, while promising in general contexts, require further adaptation and optimization for the medical domain due to its unique demands for up-to-date, specialized knowledge and strict adherence to medical guidelines.
arXiv Detail & Related papers (2024-08-25T11:09:15Z) - KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking [55.2155025063668]
KnowHalu is a novel approach for detecting hallucinations in text generated by large language models (LLMs)
It uses step-wise reasoning, multi-formulation query, multi-form knowledge for factual checking, and fusion-based detection mechanism.
Our evaluations demonstrate that KnowHalu significantly outperforms SOTA baselines in detecting hallucinations across diverse tasks.
arXiv Detail & Related papers (2024-04-03T02:52:07Z) - Comparing Knowledge Sources for Open-Domain Scientific Claim
Verification [6.726255259929497]
We show that PubMed works better with specialized biomedical claims, while Wikipedia is more suited for everyday health concerns.
We discuss the results, outline frequent retrieval patterns and challenges, and provide promising future directions.
arXiv Detail & Related papers (2024-02-05T09:57:15Z) - What Makes Medical Claims (Un)Verifiable? Analyzing Entity and Relation
Properties for Fact Verification [8.086400003948143]
The BEAR-Fact corpus is the first corpus for scientific fact verification annotated with subject-relation-object triplets, evidence documents, and fact-checking verdicts.
We show that it is possible to reliably estimate the success of evidence retrieval purely from the claim text.
The dataset is available at http://www.ims.uni-stuttgart.de/data/bioclaim.
arXiv Detail & Related papers (2024-02-02T12:27:58Z) - Towards Mitigating Hallucination in Large Language Models via
Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.
This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Unifying Relational Sentence Generation and Retrieval for Medical Image
Report Composition [142.42920413017163]
Current methods often generate the most common sentences due to dataset bias for individual case.
We propose a novel framework that unifies template retrieval and sentence generation to handle both common and rare abnormality.
arXiv Detail & Related papers (2021-01-09T04:33:27Z) - Extracting Structured Data from Physician-Patient Conversations By
Predicting Noteworthy Utterances [39.888619005843246]
We describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels.
One methodological challenge is that the conversations are long (around 1500 words) making it difficult for modern deep-learning models to use them as input.
We find that by first filtering for (predicted) noteworthy utterances, we can significantly boost predictive performance for recognizing both diagnoses and RoS abnormalities.
arXiv Detail & Related papers (2020-07-14T16:10:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.