Robust Benchmarking for Machine Learning of Clinical Entity Extraction
- URL: http://arxiv.org/abs/2007.16127v1
- Date: Fri, 31 Jul 2020 15:14:05 GMT
- Title: Robust Benchmarking for Machine Learning of Clinical Entity Extraction
- Authors: Monica Agrawal, Chloe O'Connell, Yasmin Fatemi, Ariel Levy, David
Sontag
- Abstract summary: We audit the performance of and indicate areas of improvement for state-of-the-art systems.
We find that high task accuracies for clinical entity normalization systems on the 2019 n2c2 Shared Task are misleading.
We reformulate the annotation framework for clinical entity extraction to factor in inconsistencies in medical vocabularies.
- Score: 2.9398911304923447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinical studies often require understanding elements of a patient's
narrative that exist only in free text clinical notes. To transform notes into
structured data for downstream use, these elements are commonly extracted and
normalized to medical vocabularies. In this work, we audit the performance of
and indicate areas of improvement for state-of-the-art systems. We find that
high task accuracies for clinical entity normalization systems on the 2019 n2c2
Shared Task are misleading, and underlying performance is still brittle.
Normalization accuracy is high for common concepts (95.3%), but much lower for
concepts unseen in training data (69.3%). We demonstrate that current
approaches are hindered in part by inconsistencies in medical vocabularies,
limitations of existing labeling schemas, and narrow evaluation techniques. We
reformulate the annotation framework for clinical entity extraction to factor
in these issues to allow for robust end-to-end system benchmarking. We evaluate
concordance of annotations from our new framework between two annotators and
achieve a Jaccard similarity of 0.73 for entity recognition and an agreement of
0.83 for entity normalization. We propose a path forward to address the
demonstrated need for the creation of a reference standard to spur method
development in entity recognition and normalization.
Related papers
- Benchmarking Egocentric Clinical Intent Understanding Capability for Medical Multimodal Large Language Models [48.95516224614331]
We introduce MedGaze-Bench, the first benchmark leveraging clinician gaze as a Cognitive Cursor to assess intent understanding across surgery, emergency simulation, and diagnostic interpretation.<n>Our benchmark addresses three fundamental challenges: visual homogeneity of anatomical structures, strict temporal-causal dependencies in clinical, and implicit adherence to safety protocols.
arXiv Detail & Related papers (2026-01-11T02:20:40Z) - CNSight: Evaluation of Clinical Note Segmentation Tools [3.673249612734457]
We evaluate rule-based baselines, domain-specific transformer models, and large language models for clinical note segmentation using a curated dataset of 1,000 notes from MIMIC-IV.<n>Our experiments show that large API-based models achieve the best overall performance, with GPT-5-mini reaching a best average F1 of 72.4 across sentence-level and freetext segmentation.
arXiv Detail & Related papers (2025-12-28T05:40:15Z) - Optimizing Resources for On-the-Fly Label Estimation with Multiple Unknown Medical Experts [2.904892426557913]
We propose an adaptive approach for real-time annotation that supports on-the-fly labeling of incoming data.<n>We evaluate our approach on three multi-annotator classification datasets across different modalities.
arXiv Detail & Related papers (2025-10-04T21:41:26Z) - Ontology-Based Concept Distillation for Radiology Report Retrieval and Labeling [10.504309161945065]
Most existing methods rely on comparing high-dimensional text embeddings from models like CLIP or CXR-BERT.<n>We propose a novel, ontology-driven alternative for comparing radiology report texts based on clinically grounded concepts from the Unified Medical Language System.<n>Our method extracts standardised medical entities from free-text reports using an enhanced pipeline built on RadGraph-XL and SapBERT.
arXiv Detail & Related papers (2025-08-27T14:20:50Z) - On the notion of missingness for path attribution explainability methods in medical settings: Guiding the selection of medically meaningful baselines [48.688209040613216]
We argue that a clinically normal but input-close counterfactual represents a more accurate representation of a meaningful absence of features in medical data.<n>We evaluate the approach on three distinct medical data sets and empirically demonstrate that counterfactual baselines yield more faithful and medically relevant attributions.
arXiv Detail & Related papers (2025-08-20T07:13:41Z) - Automated SNOMED CT Concept Annotation in Clinical Text Using Bi-GRU Neural Networks [0.31457219084519]
This study introduces a neural sequence labeling approach for SNOMED CT concept recognition using a Bidirectional GRU model.<n>We preprocess text with domain-adapted SpaCy and SciBERT-based tokenization, segmenting sentences into overlapping 19-token chunks enriched with contextual, syntactic, and morphological features.<n>The Bi-GRU model assigns IOB tags to identify concept spans and achieves strong performance with a 90 percent F1-score on the validation set.
arXiv Detail & Related papers (2025-08-04T16:08:49Z) - Attribute Structuring Improves LLM-Based Evaluation of Clinical Text
Summaries [62.32403630651586]
Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation.
Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process.
AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization.
arXiv Detail & Related papers (2024-03-01T21:59:03Z) - Assertion Detection Large Language Model In-context Learning LoRA
Fine-tuning [2.401755243180179]
We introduce a novel methodology that utilizes Large Language Models (LLMs) pre-trained on a vast array of medical data for assertion detection.
Our approach achieved an F-1 of 0.74, which is 0.31 higher than the previous method.
arXiv Detail & Related papers (2024-01-31T05:11:00Z) - Applying unsupervised keyphrase methods on concepts extracted from
discharge sheets [7.102620843620572]
It is necessary to identify the section in which each content is recorded and also to identify key concepts to extract meaning from clinical texts.
In this study, these challenges have been addressed by using clinical natural language processing techniques.
A set of popular unsupervised key phrase extraction methods has been verified and evaluated.
arXiv Detail & Related papers (2023-03-15T20:55:25Z) - A Meta-Evaluation of Faithfulness Metrics for Long-Form Hospital-Course
Summarization [2.8575516056239576]
Long-form clinical summarization of hospital admissions has real-world significance because of its potential to help both clinicians and patients.
We benchmark faithfulness metrics against fine-grained human annotations for model-generated summaries of a patient's Brief Hospital Course.
arXiv Detail & Related papers (2023-03-07T14:57:06Z) - USE-Evaluator: Performance Metrics for Medical Image Segmentation Models
with Uncertain, Small or Empty Reference Annotations [5.672489398972326]
There is a mismatch between the distributions of cases and difficulty level of segmentation tasks in public data sets compared to clinical practice.
Common metrics fail to measure the impact of this mismatch, especially for clinical data sets.
We study how uncertain, small, or empty reference annotations influence the value of metrics for medical image segmentation.
arXiv Detail & Related papers (2022-09-26T20:40:02Z) - Statistical Dependency Guided Contrastive Learning for Multiple Labeling
in Prenatal Ultrasound [56.631021151764955]
Standard plane recognition plays an important role in prenatal ultrasound (US) screening.
We build a novel multi-label learning scheme to identify multiple standard planes and corresponding anatomical structures simultaneously.
arXiv Detail & Related papers (2021-08-11T06:39:26Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Federated Semi-supervised Medical Image Classification via Inter-client
Relation Matching [58.26619456972598]
Federated learning (FL) has emerged with increasing popularity to collaborate distributed medical institutions for training deep networks.
This paper studies a practical yet challenging FL problem, named textitFederated Semi-supervised Learning (FSSL)
We present a novel approach for this problem, which improves over traditional consistency regularization mechanism with a new inter-client relation matching scheme.
arXiv Detail & Related papers (2021-06-16T07:58:00Z) - Detecting of a Patient's Condition From Clinical Narratives Using
Natural Language Representation [0.3149883354098941]
This paper proposes a joint clinical natural language representation learning and supervised classification framework.
The novel framework jointly discovers distributional syntactic and latent semantic (representation learning) from contextual clinical narrative inputs.
The proposed framework yields an overall classification performance with accuracy, recall, and precision of 89 % and 88 %, 89 %, respectively.
arXiv Detail & Related papers (2021-04-08T17:16:04Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.