TF-IDF vs Word Embeddings for Morbidity Identification in Clinical
Notes: An Initial Study
- URL: http://arxiv.org/abs/2105.09632v1
- Date: Thu, 20 May 2021 09:57:45 GMT
- Title: TF-IDF vs Word Embeddings for Morbidity Identification in Clinical
Notes: An Initial Study
- Authors: Danilo Dessi, Rim Helaoui, Vivek Kumar, Diego Reforgiato Recupero, and
Daniele Riboni
- Abstract summary: We propose the use of Deep Learning and Word Embeddings for identifying sixteen morbidity types within textual descriptions of clinical records.
We have employed pre-trained Word Embeddings namely GloVe and Word2Vec, and our own Word Embeddings trained on the target domain.
- Score: 3.9424051088220518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Today, we are seeing an ever-increasing number of clinical notes that contain
clinical results, images, and textual descriptions of patient's health state.
All these data can be analyzed and employed to cater novel services that can
help people and domain experts with their common healthcare tasks. However,
many technologies such as Deep Learning and tools like Word Embeddings have
started to be investigated only recently, and many challenges remain open when
it comes to healthcare domain applications. To address these challenges, we
propose the use of Deep Learning and Word Embeddings for identifying sixteen
morbidity types within textual descriptions of clinical records. For this
purpose, we have used a Deep Learning model based on Bidirectional Long-Short
Term Memory (LSTM) layers which can exploit state-of-the-art vector
representations of data such as Word Embeddings. We have employed pre-trained
Word Embeddings namely GloVe and Word2Vec, and our own Word Embeddings trained
on the target domain. Furthermore, we have compared the performances of the
deep learning approaches against the traditional tf-idf using Support Vector
Machine and Multilayer perceptron (our baselines). From the obtained results it
seems that the latter outperforms the combination of Deep Learning approaches
using any word embeddings. Our preliminary results indicate that there are
specific features that make the dataset biased in favour of traditional machine
learning approaches.
Related papers
- Representing visual classification as a linear combination of words [0.0]
We present an explainability strategy that uses a vision-language model to identify language-based descriptors of a visual classification task.
By leveraging a pre-trained joint embedding space between images and text, our approach estimates a new classification task as a linear combination of words.
We find that the resulting descriptors largely align with clinical knowledge despite a lack of domain-specific language training.
arXiv Detail & Related papers (2023-11-18T02:00:20Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - A Multi-View Joint Learning Framework for Embedding Clinical Codes and
Text Using Graph Neural Networks [23.06795121693656]
We propose a framework that learns from codes and text to combine the availability and forward-looking nature of text and better performance of ICD codes.
Our approach uses a Graph Neural Network (GNN) to process ICD codes, and Bi-LSTM to process text.
In experiments using planned surgical procedure text, our model outperforms BERT models fine-tuned to clinical data.
arXiv Detail & Related papers (2023-01-27T09:19:03Z) - Knowledge-augmented Graph Neural Networks with Concept-aware Attention for Adverse Drug Event Detection [9.334701229573739]
Adverse drug events (ADEs) are an important aspect of drug safety.
Various texts contain a wealth of information about ADEs.
Recent studies have applied word embedding and deep learning -based natural language processing to automate ADE detection from text.
We propose a concept-aware attention mechanism which learns features differently for the different types of nodes in the graph.
arXiv Detail & Related papers (2023-01-25T08:01:45Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Hierarchical Learning Using Deep Optimum-Path Forest [55.60116686945561]
Bag-of-Visual Words (BoVW) and deep learning techniques have been widely used in several domains, which include computer-assisted medical diagnoses.
In this work, we are interested in developing tools for the automatic identification of Parkinson's disease using machine learning and the concept of BoVW.
arXiv Detail & Related papers (2021-02-18T13:02:40Z) - Integration of Domain Knowledge using Medical Knowledge Graph Deep
Learning for Cancer Phenotyping [6.077023952306772]
We propose a method to integrate external knowledge from medical terminology into the context captured by word embeddings.
We evaluate the proposed approach using a Multitask Convolutional Neural Network (MT-CNN) to extract six cancer characteristics from a dataset of 900K cancer pathology reports.
arXiv Detail & Related papers (2021-01-05T03:59:43Z) - Learning Contextualized Document Representations for Healthcare Answer
Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents.
Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse.
We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.