Robust Benchmarking for Machine Learning of Clinical Entity Extraction
- URL: http://arxiv.org/abs/2007.16127v1
- Date: Fri, 31 Jul 2020 15:14:05 GMT
- Title: Robust Benchmarking for Machine Learning of Clinical Entity Extraction
- Authors: Monica Agrawal, Chloe O'Connell, Yasmin Fatemi, Ariel Levy, David
Sontag
- Abstract summary: We audit the performance of and indicate areas of improvement for state-of-the-art systems.
We find that high task accuracies for clinical entity normalization systems on the 2019 n2c2 Shared Task are misleading.
We reformulate the annotation framework for clinical entity extraction to factor in inconsistencies in medical vocabularies.
- Score: 2.9398911304923447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinical studies often require understanding elements of a patient's
narrative that exist only in free text clinical notes. To transform notes into
structured data for downstream use, these elements are commonly extracted and
normalized to medical vocabularies. In this work, we audit the performance of
and indicate areas of improvement for state-of-the-art systems. We find that
high task accuracies for clinical entity normalization systems on the 2019 n2c2
Shared Task are misleading, and underlying performance is still brittle.
Normalization accuracy is high for common concepts (95.3%), but much lower for
concepts unseen in training data (69.3%). We demonstrate that current
approaches are hindered in part by inconsistencies in medical vocabularies,
limitations of existing labeling schemas, and narrow evaluation techniques. We
reformulate the annotation framework for clinical entity extraction to factor
in these issues to allow for robust end-to-end system benchmarking. We evaluate
concordance of annotations from our new framework between two annotators and
achieve a Jaccard similarity of 0.73 for entity recognition and an agreement of
0.83 for entity normalization. We propose a path forward to address the
demonstrated need for the creation of a reference standard to spur method
development in entity recognition and normalization.
Related papers
- Attribute Structuring Improves LLM-Based Evaluation of Clinical Text
Summaries [62.32403630651586]
Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation.
Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process.
AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization.
arXiv Detail & Related papers (2024-03-01T21:59:03Z) - Assertion Detection Large Language Model In-context Learning LoRA
Fine-tuning [2.401755243180179]
We introduce a novel methodology that utilizes Large Language Models (LLMs) pre-trained on a vast array of medical data for assertion detection.
Our approach achieved an F-1 of 0.74, which is 0.31 higher than the previous method.
arXiv Detail & Related papers (2024-01-31T05:11:00Z) - Applying unsupervised keyphrase methods on concepts extracted from
discharge sheets [7.102620843620572]
It is necessary to identify the section in which each content is recorded and also to identify key concepts to extract meaning from clinical texts.
In this study, these challenges have been addressed by using clinical natural language processing techniques.
A set of popular unsupervised key phrase extraction methods has been verified and evaluated.
arXiv Detail & Related papers (2023-03-15T20:55:25Z) - A Meta-Evaluation of Faithfulness Metrics for Long-Form Hospital-Course
Summarization [2.8575516056239576]
Long-form clinical summarization of hospital admissions has real-world significance because of its potential to help both clinicians and patients.
We benchmark faithfulness metrics against fine-grained human annotations for model-generated summaries of a patient's Brief Hospital Course.
arXiv Detail & Related papers (2023-03-07T14:57:06Z) - USE-Evaluator: Performance Metrics for Medical Image Segmentation Models
with Uncertain, Small or Empty Reference Annotations [5.672489398972326]
There is a mismatch between the distributions of cases and difficulty level of segmentation tasks in public data sets compared to clinical practice.
Common metrics fail to measure the impact of this mismatch, especially for clinical data sets.
We study how uncertain, small, or empty reference annotations influence the value of metrics for medical image segmentation.
arXiv Detail & Related papers (2022-09-26T20:40:02Z) - Statistical Dependency Guided Contrastive Learning for Multiple Labeling
in Prenatal Ultrasound [56.631021151764955]
Standard plane recognition plays an important role in prenatal ultrasound (US) screening.
We build a novel multi-label learning scheme to identify multiple standard planes and corresponding anatomical structures simultaneously.
arXiv Detail & Related papers (2021-08-11T06:39:26Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Federated Semi-supervised Medical Image Classification via Inter-client
Relation Matching [58.26619456972598]
Federated learning (FL) has emerged with increasing popularity to collaborate distributed medical institutions for training deep networks.
This paper studies a practical yet challenging FL problem, named textitFederated Semi-supervised Learning (FSSL)
We present a novel approach for this problem, which improves over traditional consistency regularization mechanism with a new inter-client relation matching scheme.
arXiv Detail & Related papers (2021-06-16T07:58:00Z) - Detecting of a Patient's Condition From Clinical Narratives Using
Natural Language Representation [0.3149883354098941]
This paper proposes a joint clinical natural language representation learning and supervised classification framework.
The novel framework jointly discovers distributional syntactic and latent semantic (representation learning) from contextual clinical narrative inputs.
The proposed framework yields an overall classification performance with accuracy, recall, and precision of 89 % and 88 %, 89 %, respectively.
arXiv Detail & Related papers (2021-04-08T17:16:04Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.