Assessing mortality prediction through different representation models
based on concepts extracted from clinical notes
- URL: http://arxiv.org/abs/2207.10872v1
- Date: Fri, 22 Jul 2022 04:34:33 GMT
- Title: Assessing mortality prediction through different representation models
based on concepts extracted from clinical notes
- Authors: Hoda Memarzadeh, Nasser Ghadiri, Maryam Lotfi Shahreza
- Abstract summary: Learning of embedding is a method for converting notes into a format that makes them comparable.
Transformer-based representation models have recently made a great leap forward.
We performed experiments to measure the usefulness of the learned embedding vectors in the task of hospital mortality prediction.
- Score: 2.707154152696381
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent years have seen particular interest in using electronic medical
records (EMRs) for secondary purposes to enhance the quality and safety of
healthcare delivery. EMRs tend to contain large amounts of valuable clinical
notes. Learning of embedding is a method for converting notes into a format
that makes them comparable. Transformer-based representation models have
recently made a great leap forward. These models are pre-trained on large
online datasets to understand natural language texts effectively. The quality
of a learning embedding is influenced by how clinical notes are used as input
to representation models. A clinical note has several sections with different
levels of information value. It is also common for healthcare providers to use
different expressions for the same concept. Existing methods use clinical notes
directly or with an initial preprocessing as input to representation models.
However, to learn a good embedding, we identified the most essential clinical
notes section. We then mapped the extracted concepts from selected sections to
the standard names in the Unified Medical Language System (UMLS). We used the
standard phrases corresponding to the unique concepts as input for clinical
models. We performed experiments to measure the usefulness of the learned
embedding vectors in the task of hospital mortality prediction on a subset of
the publicly available Medical Information Mart for Intensive Care (MIMIC-III)
dataset. According to the experiments, clinical transformer-based
representation models produced better results with getting input generated by
standard names of extracted unique concepts compared to other input formats.
The best-performing models were BioBERT, PubMedBERT, and UmlsBERT,
respectively.
Related papers
- SNOBERT: A Benchmark for clinical notes entity linking in the SNOMED CT clinical terminology [43.89160296332471]
We propose a method for linking text spans in clinical notes to specific concepts in the SNOMED CT using BERT-based models.
The method consists of two stages: candidate selection and candidate matching. The models were trained on one of the largest publicly available dataset of labeled clinical notes.
arXiv Detail & Related papers (2024-05-25T08:00:44Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Investigating Alternative Feature Extraction Pipelines For Clinical Note
Phenotyping [0.0]
Using computational systems for the extraction of medical attributes offers many applications.
BERT-based models can be used to transform clinical notes into a series of representations.
We propose an alternative pipeline utilizing ScispaCyNeumann for extraction of common diseases.
arXiv Detail & Related papers (2023-10-05T02:51:51Z) - Making the Most Out of the Limited Context Length: Predictive Power
Varies with Clinical Note Type and Note Section [70.37720062263176]
We propose a framework to analyze the sections with high predictive power.
Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large.
arXiv Detail & Related papers (2023-07-13T20:04:05Z) - Hierarchical Pretraining for Biomedical Term Embeddings [4.69793648771741]
We propose HiPrBERT, a novel biomedical term representation model trained on hierarchical data.
We show that HiPrBERT effectively learns the pair-wise distance from hierarchical information, resulting in a substantially more informative embeddings for further biomedical applications.
arXiv Detail & Related papers (2023-07-01T08:16:00Z) - Applying unsupervised keyphrase methods on concepts extracted from
discharge sheets [7.102620843620572]
It is necessary to identify the section in which each content is recorded and also to identify key concepts to extract meaning from clinical texts.
In this study, these challenges have been addressed by using clinical natural language processing techniques.
A set of popular unsupervised key phrase extraction methods has been verified and evaluated.
arXiv Detail & Related papers (2023-03-15T20:55:25Z) - Cross-Lingual Knowledge Transfer for Clinical Phenotyping [55.92262310716537]
We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
arXiv Detail & Related papers (2022-08-03T08:33:21Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Does the Magic of BERT Apply to Medical Code Assignment? A Quantitative
Study [2.871614744079523]
It is not clear if pretrained models are useful for medical code prediction without further architecture engineering.
We propose a hierarchical fine-tuning architecture to capture interactions between distant words and adopt label-wise attention to exploit label information.
Contrary to current trends, we demonstrate that a carefully trained classical CNN outperforms attention-based models on a MIMIC-III subset with frequent codes.
arXiv Detail & Related papers (2021-03-11T07:23:45Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.