Hallucination vs interpretation: rethinking accuracy and precision in AI-assisted data extraction for knowledge synthesis
- URL: http://arxiv.org/abs/2508.09458v2
- Date: Thu, 14 Aug 2025 03:36:46 GMT
- Title: Hallucination vs interpretation: rethinking accuracy and precision in AI-assisted data extraction for knowledge synthesis
- Authors: Xi Long, Christy Boscardin, Lauren A. Maggio, Joseph A. Costello, Ralph Gonzales, Rasmyah Hammoudeh, Ki Lai, Yoon Soo Park, Brian C. Gin,
- Abstract summary: We developed an extraction platform using large language models (LLMs) to automate data extraction.<n>We compared AI to human responses across 187 publications and 17 extraction questions from a published scoping review.<n>Findings suggest AI variability depends more on interpretability than hallucination.
- Score: 0.9898534984111934
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Knowledge syntheses (literature reviews) are essential to health professions education (HPE), consolidating findings to advance theory and practice. However, they are labor-intensive, especially during data extraction. Artificial Intelligence (AI)-assisted extraction promises efficiency but raises concerns about accuracy, making it critical to distinguish AI 'hallucinations' (fabricated content) from legitimate interpretive differences. We developed an extraction platform using large language models (LLMs) to automate data extraction and compared AI to human responses across 187 publications and 17 extraction questions from a published scoping review. AI-human, human-human, and AI-AI consistencies were measured using interrater reliability (categorical) and thematic similarity ratings (open-ended). Errors were identified by comparing extracted responses to source publications. AI was highly consistent with humans for concrete, explicitly stated questions (e.g., title, aims) and lower for questions requiring subjective interpretation or absent in text (e.g., Kirkpatrick's outcomes, study rationale). Human-human consistency was not higher than AI-human and showed the same question-dependent variability. Discordant AI-human responses (769/3179 = 24.2%) were mostly due to interpretive differences (18.3%); AI inaccuracies were rare (1.51%), while humans were nearly three times more likely to state inaccuracies (4.37%). Findings suggest AI variability depends more on interpretability than hallucination. Repeating AI extraction can identify interpretive complexity or ambiguity, refining processes before human review. AI can be a transparent, trustworthy partner in knowledge synthesis, though caution is needed to preserve critical human insights.
Related papers
- Explainable AI as a Double-Edged Sword in Dermatology: The Impact on Clinicians versus The Public [46.86429592892395]
explainable AI (XAI) addresses this by providing AI decision-making insight.<n>We present results from two large-scale experiments combining a fairness-based diagnosis AI model and different XAI explanations.
arXiv Detail & Related papers (2025-12-14T00:06:06Z) - A perceptual bias of AI Logical Argumentation Ability in Writing [3.1238547837436115]
The ability of logical reasoning like humans is often used as a criterion to assess whether a machine can think.<n>This study explores whether human biases influence evaluations of the reasoning abilities of AI.
arXiv Detail & Related papers (2025-11-27T06:39:11Z) - Exploring the Impact of Explainable AI and Cognitive Capabilities on Users' Decisions [1.1049608786515839]
Personality traits like the Need for Cognition (NFC) can lead to different decision-making outcomes among low and high NFC individuals.<n>We investigated how presenting AI information affects accuracy, reliance on AI, and cognitive load in a loan application scenario.<n>We found no significant differences between low and high NFC groups in accuracy or cognitive load, raising questions about the role of personality traits in AI-assisted decision-making.
arXiv Detail & Related papers (2025-05-02T11:30:53Z) - Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing [55.2480439325792]
This study systematically evaluations twelve state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation dataset.<n>Our findings reveal that detectors frequently flag even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models.
arXiv Detail & Related papers (2025-02-21T18:45:37Z) - Human Bias in the Face of AI: Examining Human Judgment Against Text Labeled as AI Generated [48.70176791365903]
This study explores how bias shapes the perception of AI versus human generated content.<n>We investigated how human raters respond to labeled and unlabeled content.
arXiv Detail & Related papers (2024-09-29T04:31:45Z) - The Role of AI in Drug Discovery: Challenges, Opportunities, and
Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed.
The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z) - Can Machines Imitate Humans? Integrative Turing Tests for Vision and Language Demonstrate a Narrowing Gap [45.6806234490428]
We benchmark current AIs in their abilities to imitate humans in three language tasks and three vision tasks.
Experiments involved 549 human agents plus 26 AI agents for dataset creation, and 1,126 human judges plus 10 AI judges.
Results reveal that current AIs are not far from being able to impersonate humans in complex language and vision challenges.
arXiv Detail & Related papers (2022-11-23T16:16:52Z) - The Who in XAI: How AI Background Shapes Perceptions of AI Explanations [61.49776160925216]
We conduct a mixed-methods study of how two different groups--people with and without AI background--perceive different types of AI explanations.
We find that (1) both groups showed unwarranted faith in numbers for different reasons and (2) each group found value in different explanations beyond their intended design.
arXiv Detail & Related papers (2021-07-28T17:32:04Z) - Does Explainable Artificial Intelligence Improve Human Decision-Making? [17.18994675838646]
We compare and evaluate objective human decision accuracy without AI (control), with an AI prediction (no explanation) and AI prediction with explanation.
We find any kind of AI prediction tends to improve user decision accuracy, but no conclusive evidence that explainable AI has a meaningful impact.
Our results indicate that, at least in some situations, the "why" information provided in explainable AI may not enhance user decision-making.
arXiv Detail & Related papers (2020-06-19T15:46:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.