Estimating Redundancy in Clinical Text
- URL: http://arxiv.org/abs/2105.11832v1
- Date: Tue, 25 May 2021 11:01:45 GMT
- Title: Estimating Redundancy in Clinical Text
- Authors: Thomas Searle, Zina Ibrahim, James Teo, Richard JB Dobson
- Abstract summary: Clinicians populate new documents by duplicating existing notes, then updating accordingly.
quantifying information redundancy can play an essential role in evaluating innovations that operate on clinical narratives.
We present and evaluate two strategies to measure redundancy: an information-theoretic approach and a lexicosyntactic and semantic model.
- Score: 6.245180523143739
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The current mode of use of Electronic Health Record (EHR) elicits text
redundancy. Clinicians often populate new documents by duplicating existing
notes, then updating accordingly. Data duplication can lead to a propagation of
errors, inconsistencies and misreporting of care. Therefore, quantifying
information redundancy can play an essential role in evaluating innovations
that operate on clinical narratives.
This work is a quantitative examination of information redundancy in EHR
notes. We present and evaluate two strategies to measure redundancy: an
information-theoretic approach and a lexicosyntactic and semantic model. We
evaluate the measures by training large Transformer-based language models using
clinical text from a large openly available US-based ICU dataset and a large
multi-site UK based Trust. By comparing the information-theoretic content of
the trained models with open-domain language models, the language models
trained using clinical text have shown ~1.5x to ~3x less efficient than
open-domain corpora. Manual evaluation shows a high correlation with
lexicosyntactic and semantic redundancy, with averages ~43 to ~65%.
Related papers
- Representation Learning of Structured Data for Medical Foundation Models [29.10129199884847]
We introduce the UniStruct architecture to design a multimodal medical foundation model of unstructured text and structured data.
Our approach is validated through model pre-training on both an extensive internal medical database and a public repository of structured medical records.
arXiv Detail & Related papers (2024-10-17T09:02:28Z) - Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study [2.0884301753594334]
This study performs a comparative analysis of various natural language models for medical text classification.
BERT outperforms Bi-LSTM models by up to 28% and the baseline BERT model by up to 16% for recall of the minority classes.
arXiv Detail & Related papers (2024-08-30T10:28:49Z) - Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting [12.166472806042592]
Automatic extraction of medical information from clinical documents poses several challenges.
Recent advances in domain-adaptation and prompting methods showed promising results with minimal training data.
We demonstrate that a lightweight, domain-adapted pretrained model, prompted with just 20 shots, outperforms a traditional classification model by 30.5% accuracy.
arXiv Detail & Related papers (2024-03-20T08:01:33Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.