RelCAT: Advancing Extraction of Clinical Inter-Entity Relationships from Unstructured Electronic Health Records
- URL: http://arxiv.org/abs/2501.16077v1
- Date: Mon, 27 Jan 2025 14:26:47 GMT
- Title: RelCAT: Advancing Extraction of Clinical Inter-Entity Relationships from Unstructured Electronic Health Records
- Authors: Shubham Agarwal, Vlad Dinu, Thomas Searle, Mart Ratas, Anthony Shek, Dan F. Stein, James Teo, Richard Dobson,
- Abstract summary: RelCAT (Relation Concept Toolkit) is an interactive tool, library, and workflow designed to classify relations between entities extracted from clinical narratives.
The toolkit implements state-of-the-art machine learning models such as BERT and Llama along with proven evaluation and training methods.
We demonstrate a dataset annotation tool (built within MedCATTrainer), model training, and evaluate our methodology on both openly available gold-standard and real-world UK National Health Service (NHS) hospital clinical datasets.
- Score: 1.9065879861609418
- License:
- Abstract: This study introduces RelCAT (Relation Concept Annotation Toolkit), an interactive tool, library, and workflow designed to classify relations between entities extracted from clinical narratives. Building upon the CogStack MedCAT framework, RelCAT addresses the challenge of capturing complete clinical relations dispersed within text. The toolkit implements state-of-the-art machine learning models such as BERT and Llama along with proven evaluation and training methods. We demonstrate a dataset annotation tool (built within MedCATTrainer), model training, and evaluate our methodology on both openly available gold-standard and real-world UK National Health Service (NHS) hospital clinical datasets. We perform extensive experimentation and a comparative analysis of the various publicly available models with varied approaches selected for model fine-tuning. Finally, we achieve macro F1-scores of 0.977 on the gold-standard n2c2, surpassing the previous state-of-the-art performance, and achieve performance of >=0.93 F1 on our NHS gathered datasets.
Related papers
- CLINICSUM: Utilizing Language Models for Generating Clinical Summaries from Patient-Doctor Conversations [2.77462589810782]
ClinicSum is a framework designed to automatically generate clinical summaries from patient-doctor conversations.
It is evaluated through both automatic metrics (e.g., ROUGE, BERTScore) and expert human assessments.
arXiv Detail & Related papers (2024-12-05T15:34:02Z) - Named Clinical Entity Recognition Benchmark [2.9332007863461893]
This report introduces a Named Clinical Entity Recognition Benchmark.
It addresses the crucial natural language processing (NLP) task of extracting structured information from clinical narratives.
The leaderboard provides a standardized platform for assessing diverse language models.
arXiv Detail & Related papers (2024-10-07T14:00:18Z) - Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study [2.0884301753594334]
This study performs a comparative analysis of various natural language models for medical text classification.
BERT outperforms Bi-LSTM models by up to 28% and the baseline BERT model by up to 16% for recall of the minority classes.
arXiv Detail & Related papers (2024-08-30T10:28:49Z) - Towards Unifying Anatomy Segmentation: Automated Generation of a
Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines [113.08940153125616]
We generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage.
Our proposed procedure does not rely on manual annotation during the label aggregation stage.
We release our trained unified anatomical segmentation model capable of predicting $142$ anatomical structures on CT data.
arXiv Detail & Related papers (2023-07-25T09:48:13Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Enhancing Clinical Information Extraction with Transferred Contextual
Embeddings [9.143551270841858]
Bidirectional Representations from Transformers (BERT) model has achieved the state-of-the-art performance for many natural language processing (NLP) tasks.
We show that BERT based pre-training models can be transferred to health-related documents under mild conditions.
arXiv Detail & Related papers (2021-09-15T12:22:57Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Multi-domain Clinical Natural Language Processing with MedCAT: the
Medical Concept Annotation Toolkit [5.49956798378633]
We present the open-source Medical Concept EHR Toolkit (MedMedCAT)
It provides a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT.
We show improved performance in extracting UMLS concepts from open datasets.
Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over 8.8B words from 17M clinical records.
arXiv Detail & Related papers (2020-10-02T19:01:02Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.