Token Classification for Disambiguating Medical Abbreviations
- URL: http://arxiv.org/abs/2210.02487v1
- Date: Wed, 5 Oct 2022 18:06:49 GMT
- Title: Token Classification for Disambiguating Medical Abbreviations
- Authors: Mucahit Cevik, Sanaz Mohammad Jafari, Mitchell Myers, Savas Yildirim
- Abstract summary: Abbreviations are unavoidable yet critical parts of the medical text.
Lack of a standardized mapping system makes disambiguating abbreviations a difficult and time-consuming task.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Abbreviations are unavoidable yet critical parts of the medical text. Using
abbreviations, especially in clinical patient notes, can save time and space,
protect sensitive information, and help avoid repetitions. However, most
abbreviations might have multiple senses, and the lack of a standardized
mapping system makes disambiguating abbreviations a difficult and
time-consuming task. The main objective of this study is to examine the
feasibility of token classification methods for medical abbreviation
disambiguation. Specifically, we explore the capability of token classification
methods to deal with multiple unique abbreviations in a single text. We use two
public datasets to compare and contrast the performance of several transformer
models pre-trained on different scientific and medical corpora. Our proposed
token classification approach outperforms the more commonly used text
classification models for the abbreviation disambiguation task. In particular,
the SciBERT model shows a strong performance for both token and text
classification tasks over the two considered datasets. Furthermore, we find
that abbreviation disambiguation performance for the text classification models
becomes comparable to that of token classification only when postprocessing is
applied to their predictions, which involves filtering possible labels for an
abbreviation based on the training data.
Related papers
- Blueprinting the Future: Automatic Item Categorization using
Hierarchical Zero-Shot and Few-Shot Classifiers [6.907552533477328]
This study unveils a novel approach employing the zero-shot and few-shot Generative Pretrained Transformer (GPT) for hierarchical item categorization.
The hierarchical nature of examination blueprints is navigated seamlessly, allowing for a tiered classification of items across multiple levels.
An initial simulation with artificial data demonstrates the efficacy of this method, achieving an average accuracy of 92.91% measured by the F1 score.
arXiv Detail & Related papers (2023-12-06T15:51:49Z) - CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class
Classification [57.62886091828512]
We propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix) for many-class classification.
Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification.
arXiv Detail & Related papers (2022-11-11T03:45:59Z) - Classifying Unstructured Clinical Notes via Automatic Weak Supervision [17.45660355026785]
We introduce a general weakly-supervised text classification framework that learns from class-label descriptions only.
We leverage the linguistic domain knowledge stored within pre-trained language models and the data programming framework to assign code labels to texts.
arXiv Detail & Related papers (2022-06-24T05:55:49Z) - Label Semantic Aware Pre-training for Few-shot Text Classification [53.80908620663974]
We propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems.
LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains.
arXiv Detail & Related papers (2022-04-14T17:33:34Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z) - Learning Image Labels On-the-fly for Training Robust Classification
Models [13.669654965671604]
We show how noisy annotations (e.g., from different algorithm-based labelers) can be utilized together and mutually benefit the learning of classification tasks.
A meta-training based label-sampling module is designed to attend the labels that benefit the model learning the most through additional back-propagation processes.
arXiv Detail & Related papers (2020-09-22T05:38:44Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z) - Exemplar Auditing for Multi-Label Biomedical Text Classification [0.4873362301533824]
We generalize a recently proposed zero-shot sequence labeling method, "supervised labeling via a convolutional decomposition"
The approach yields classification with "introspection", relating the fine-grained features of an inference-time prediction to their nearest neighbors.
Our proposed approach yields both a competitively effective classification model and an interrogation mechanism to aid healthcare workers in understanding the salient features that drive the model's predictions.
arXiv Detail & Related papers (2020-04-07T02:54:20Z) - Seeing The Whole Patient: Using Multi-Label Medical Text Classification
Techniques to Enhance Predictions of Medical Codes [2.158285012874102]
We present results of multi-label medical text classification problems with 18, 50 and 155 labels.
For imbalanced data we show that labels which occur infrequently, benefit the most from additional features incorporated in embeddings.
High dimensional embeddings from this research are made available for public use.
arXiv Detail & Related papers (2020-03-29T02:19:30Z) - Structured Prediction with Partial Labelling through the Infimum Loss [85.4940853372503]
The goal of weak supervision is to enable models to learn using only forms of labelling which are cheaper to collect.
This is a type of incomplete annotation where, for each datapoint, supervision is cast as a set of labels containing the real one.
This paper provides a unified framework based on structured prediction and on the concept of infimum loss to deal with partial labelling.
arXiv Detail & Related papers (2020-03-02T13:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.