Related papers: WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction

Related papers

A Systematic Analysis of Large Language Models with RAG-enabled Dynamic Prompting for Medical Error Detection and Correction [8.312687115594512]
We evaluate zero-shot prompting, static prompting with random exemplars, and retrieval-augmented dynamic prompting.<n>We measured performance using accuracy, recall, false-positive rate (FPR), and an aggregate of ROUGE-1, BLEURT, and BERTScore for error correction.
arXiv Detail & Related papers (2025-11-25T02:40:49Z)
MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts [0.0]
Large language models (LLMs) show increasing promise in medical applications, but their ability to detect and correct errors in clinical texts remains under-evaluated.<n>We introduce MedRECT, a cross-lingual benchmark (Japanese/English) that formulates medical error handling as three subtasks.<n>We evaluate 9 contemporary LLMs spanning proprietary, open-weight, and reasoning families.
arXiv Detail & Related papers (2025-11-01T06:19:34Z)
MedREK: Retrieval-Based Editing for Medical LLMs with Key-Aware Prompts [70.64143198545031]
We propose MedREK, a retrieval-based editing framework that integrates a shared query-key module for precise matching with an attention-based prompt encoder for informative guidance.<n>Our results on various medical benchmarks demonstrate that our MedREK achieves superior performance across different core metrics.
arXiv Detail & Related papers (2025-10-15T12:50:33Z)
Toward Reliable Clinical Coding with Language Models: Verification and Lightweight Adaptation [3.952186976672079]
We show that lightweight interventions, including prompt engineering and small-scale fine-tuning, can improve accuracy without the computational overhead of search-based methods.<n>To address hierarchically near-miss errors, we introduce clinical code verification as both a standalone task and a pipeline component.
arXiv Detail & Related papers (2025-10-08T23:50:58Z)
Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models [3.3091869879941687]
We investigate fine-tuning Vision-Language Models (VLMs) for multi-task medical image understanding.<n>We reformulate each task into instruction-based prompts suitable for vision-language reasoning.<n>Results show that multi-task training improves robustness and accuracy.
arXiv Detail & Related papers (2025-05-22T13:18:44Z)
Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions. We propose a novel approach utilizing structured medical reasoning. Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment [108.55277188617035]
Large language models (LLMs) have been widely adopted in various downstream task domains, but their ability to directly recall and apply factual medical knowledge remains under-explored. Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs' inherent medical knowledge from their reasoning capabilities. We introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs' one-hop factual medical knowledge.
arXiv Detail & Related papers (2025-02-20T05:27:51Z)
MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes [22.401540975926324]
We introduce MEDEC, the first publicly available benchmark for medical error detection and correction in clinical notes. MEDEC consists of 3,848 clinical texts, including 488 clinical notes from three US hospital systems. We evaluate recent LLMs for the tasks of detecting and correcting medical errors requiring both medical knowledge and reasoning capabilities.
arXiv Detail & Related papers (2024-12-26T15:54:10Z)
MedAutoCorrect: Image-Conditioned Autocorrection in Medical Reporting [31.710972402763527]
In medical reporting, the accuracy of radiological reports, whether generated by humans or machine learning algorithms, is critical. We tackle a new task in this paper: image-conditioned autocorrection of inaccuracies within these reports. We propose a two-stage framework capable of pinpointing these errors and then making corrections, simulating an textitautocorrection process.
arXiv Detail & Related papers (2024-12-04T02:32:53Z)
Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding [92.32881381717594]
We introduce ALternate Contrastive Decoding (ALCD) to solve hallucination issues in medical information extraction tasks. ALCD demonstrates significant improvements in resolving hallucination issues compared to conventional decoding methods.
arXiv Detail & Related papers (2024-10-21T07:19:19Z)
Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE) RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation. Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z)
ReXErr: Synthesizing Clinically Meaningful Errors in Diagnostic Radiology Reports [1.9106067578277455]
We introduce ReXErr, a methodology that leverages Large Language Models to generate representative errors within chest X-ray reports. We developed error categories that capture common mistakes in both human and AI-generated reports. Our approach uses a novel sampling scheme to inject diverse errors while maintaining clinical plausibility.
arXiv Detail & Related papers (2024-09-17T01:42:39Z)
Integrating Knowledge Retrieval and Large Language Models for Clinical Report Correction [7.144169681445819]
This study proposes an approach for error correction in radiology reports, leveraging large language models (LLMs) and retrieval-augmented generation (RAG) techniques. The proposed framework employs a novel internal+external retrieval mechanism to extract relevant medical entities and relations from the report of interest and an external knowledge source. The effectiveness of the approach is evaluated using a benchmark dataset created by corrupting real-world radiology reports with realistic errors, guided by domain experts.
arXiv Detail & Related papers (2024-06-21T10:48:21Z)
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z)
Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints [8.547853819087043]
We evaluate the capability of general LLMs to identify and correct medical errors with multiple prompting strategies. We propose incorporating error-span predictions from a smaller, fine-tuned model in two ways. Our best-performing solution with 8-shot + CoT + hints ranked sixth in the shared task leaderboard.
arXiv Detail & Related papers (2024-05-28T10:20:29Z)
PromptMind Team at MEDIQA-CORR 2024: Improving Clinical Text Correction with Error Categorization and LLM Ensembles [0.0]
This paper describes our approach to the MEDIQA-CORR shared task, which involves error detection and correction in clinical notes curated by medical professionals. We aim to assess the capabilities of Large Language Models trained on a vast corpora of internet data that contain both factual and unreliable information.
arXiv Detail & Related papers (2024-05-14T07:16:36Z)
MediFact at MEDIQA-CORR 2024: Why AI Needs a Human Touch [0.0]
We present a novel approach submitted to the MEDIQA-CORR 2024 shared task. Our method emphasizes extracting contextually relevant information from available clinical text data. By integrating domain expertise and prioritizing meaningful information extraction, our approach underscores the significance of a human-centric strategy in adapting AI for healthcare.
arXiv Detail & Related papers (2024-04-27T20:28:38Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models. We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation. We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z)
Factual Error Correction for Abstractive Summaries Using Entity Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process. RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary. Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.