Related papers: Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study

Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study

URL: http://arxiv.org/abs/2408.17181v1
Date: Fri, 30 Aug 2024 10:28:49 GMT
Title: Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study
Authors: Shubham Agarwal, Thomas Searle, Mart Ratas, Anthony Shek, James Teo, Richard Dobson,
Abstract summary: This study performs a comparative analysis of various natural language models for medical text classification. BERT outperforms Bi-LSTM models by up to 28% and the baseline BERT model by up to 16% for recall of the minority classes.
Score: 2.0884301753594334
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Electronic Health Records are large repositories of valuable clinical data, with a significant portion stored in unstructured text format. This textual data includes clinical events (e.g., disorders, symptoms, findings, medications and procedures) in context that if extracted accurately at scale can unlock valuable downstream applications such as disease prediction. Using an existing Named Entity Recognition and Linking methodology, MedCAT, these identified concepts need to be further classified (contextualised) for their relevance to the patient, and their temporal and negated status for example, to be useful downstream. This study performs a comparative analysis of various natural language models for medical text classification. Extensive experimentation reveals the effectiveness of transformer-based language models, particularly BERT. When combined with class imbalance mitigation techniques, BERT outperforms Bi-LSTM models by up to 28% and the baseline BERT model by up to 16% for recall of the minority classes. The method has been implemented as part of CogStack/MedCAT framework and made available to the community for further research.

Related papers

Clinical NLP with Attention-Based Deep Learning for Multi-Disease Prediction [44.0876796031468]
This paper addresses the challenges posed by the unstructured nature and high-dimensional semantic complexity of electronic health record texts.<n>A deep learning method based on attention mechanisms is proposed to achieve unified modeling for information extraction and multi-label disease prediction.
arXiv Detail & Related papers (2025-07-02T07:45:22Z)
KDH-MLTC: Knowledge Distillation for Healthcare Multi-Label Text Classification [4.8342038441006805]
This research presents Knowledge Distillation for Healthcare Multi-Label Text Classification (KDH-MLTC)<n>The proposed approach addresses conventional healthcare Multi-Label Text Classification challenges by integrating knowledge distillation and sequential fine-tuning.<n>Experiments conducted on three medical literature datasets demonstrate that KDH-MLTC achieves superior performance compared to existing approaches.
arXiv Detail & Related papers (2025-05-12T00:58:25Z)
Transfer Learning with Clinical Concept Embeddings from Large Language Models [4.838020198075334]
Large Language Models (LLMs) show significant potential of capturing the semantic meaning of clinical concepts. This study analyzed electronic health records from two large healthcare systems to assess the impact of semantic embeddings on local, shared, and transfer learning models.
arXiv Detail & Related papers (2024-09-20T20:50:55Z)
Multi-objective Representation for Numbers in Clinical Narratives: A CamemBERT-Bio-Based Alternative to Large-Scale LLMs [0.9208007322096533]
This paper investigates the limitations of Transformer models in understanding numerical values. It aims to categorize numerical values extracted from medical documents into eight specific physiological categories using CamemBERT-bio.
arXiv Detail & Related papers (2024-05-28T01:15:21Z)
SNOBERT: A Benchmark for clinical notes entity linking in the SNOMED CT clinical terminology [43.89160296332471]
We propose a method for linking text spans in clinical notes to specific concepts in the SNOMED CT using BERT-based models. The method consists of two stages: candidate selection and candidate matching. The models were trained on one of the largest publicly available dataset of labeled clinical notes.
arXiv Detail & Related papers (2024-05-25T08:00:44Z)
Knowledge Graph Embeddings for Multi-Lingual Structured Representations of Radiology Reports [40.606143019674654]
We introduce a novel light-weight graph-based embedding method specifically catering radiology reports. It takes into account the structure and composition of the report, while also connecting medical terms in the report. We show the use of this embedding on two tasks namely disease classification of X-ray reports and image classification.
arXiv Detail & Related papers (2023-09-02T11:46:41Z)
Hierarchical Pretraining for Biomedical Term Embeddings [4.69793648771741]
We propose HiPrBERT, a novel biomedical term representation model trained on hierarchical data. We show that HiPrBERT effectively learns the pair-wise distance from hierarchical information, resulting in a substantially more informative embeddings for further biomedical applications.
arXiv Detail & Related papers (2023-07-01T08:16:00Z)
Development and validation of a natural language processing algorithm to pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain. We annotated a corpus of clinical documents according to 12 types of identifying entities. We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z)
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings. We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z)
Estimating Redundancy in Clinical Text [6.245180523143739]
Clinicians populate new documents by duplicating existing notes, then updating accordingly. quantifying information redundancy can play an essential role in evaluating innovations that operate on clinical narratives. We present and evaluate two strategies to measure redundancy: an information-theoretic approach and a lexicosyntactic and semantic model.
arXiv Detail & Related papers (2021-05-25T11:01:45Z)
An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research. Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains. In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z)
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients. We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community. We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence. We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
Semi-supervised Medical Image Classification with Relation-driven Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification. It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations. Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.