Comparative Analysis of Text Classification Approaches in Electronic
Health Records
- URL: http://arxiv.org/abs/2005.06624v1
- Date: Fri, 8 May 2020 14:04:18 GMT
- Title: Comparative Analysis of Text Classification Approaches in Electronic
Health Records
- Authors: Aurelie Mascio, Zeljko Kraljevic, Daniel Bean, Richard Dobson, Robert
Stewart, Rebecca Bendayan, Angus Roberts
- Abstract summary: We analyse the impact of various word representations, text pre-processing and classification algorithms on the performance of four different text classification tasks.
Results show that traditional approaches, when tailored to the specific language and structure of the text inherent to the classification task, can achieve or exceed the performance of more recent ones.
- Score: 0.6229951975208341
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text classification tasks which aim at harvesting and/or organizing
information from electronic health records are pivotal to support clinical and
translational research. However these present specific challenges compared to
other classification tasks, notably due to the particular nature of the medical
lexicon and language used in clinical records. Recent advances in embedding
methods have shown promising results for several clinical tasks, yet there is
no exhaustive comparison of such approaches with other commonly used word
representations and classification models. In this work, we analyse the impact
of various word representations, text pre-processing and classification
algorithms on the performance of four different text classification tasks. The
results show that traditional approaches, when tailored to the specific
language and structure of the text inherent to the classification task, can
achieve or exceed the performance of more recent ones based on contextual
embeddings such as BERT.
Related papers
- Text Classification using Graph Convolutional Networks: A Comprehensive Survey [11.1080224302799]
Graph convolution network (GCN)-based approaches have gained a lot of traction in this domain over the last decade.
This work aims to summarize and categorize various GCN-based Text Classification approaches with regard to the architecture and mode of supervision.
arXiv Detail & Related papers (2024-10-12T07:03:42Z) - Explainability of machine learning approaches in forensic linguistics: a case study in geolinguistic authorship profiling [46.58131072375399]
We explore the explainability of machine learning approaches considering the forensic context.
We focus on variety classification as a means of geolinguistic profiling of unknown texts based on social media data from the German-speaking area.
We find that the extracted lexical features are indeed representative of their respective varieties and note that the trained models also rely on place names for classifications.
arXiv Detail & Related papers (2024-04-29T08:52:52Z) - Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification [4.498100922387482]
Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient.
Previous results demonstrated that these methods can even improve performance on some classification tasks.
This paper investigates how these techniques influence the classification performance and computation costs compared to full fine-tuning.
arXiv Detail & Related papers (2023-08-14T17:12:43Z) - Making the Most Out of the Limited Context Length: Predictive Power
Varies with Clinical Note Type and Note Section [70.37720062263176]
We propose a framework to analyze the sections with high predictive power.
Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large.
arXiv Detail & Related papers (2023-07-13T20:04:05Z) - Evaluating Unsupervised Text Classification: Zero-shot and
Similarity-based Approaches [0.6767885381740952]
Similarity-based approaches attempt to classify instances based on similarities between text document representations and class description representations.
Zero-shot text classification approaches aim to generalize knowledge gained from a training task by assigning appropriate labels of unknown classes to text documents.
This paper conducts a systematic evaluation of different similarity-based and zero-shot approaches for text classification of unseen classes.
arXiv Detail & Related papers (2022-11-29T15:14:47Z) - Cross-Lingual Knowledge Transfer for Clinical Phenotyping [55.92262310716537]
We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
arXiv Detail & Related papers (2022-08-03T08:33:21Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - Detect and Classify -- Joint Span Detection and Classification for
Health Outcomes [15.496885113949252]
We propose a method that uses both word-level and sentence-level information to simultaneously perform outcome span detection and outcome type classification.
Experimental results on several benchmark datasets for health outcome detection show that our model consistently outperforms decoupled methods.
arXiv Detail & Related papers (2021-04-15T21:47:15Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z) - Seeing The Whole Patient: Using Multi-Label Medical Text Classification
Techniques to Enhance Predictions of Medical Codes [2.158285012874102]
We present results of multi-label medical text classification problems with 18, 50 and 155 labels.
For imbalanced data we show that labels which occur infrequently, benefit the most from additional features incorporated in embeddings.
High dimensional embeddings from this research are made available for public use.
arXiv Detail & Related papers (2020-03-29T02:19:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.