Classifying Unstructured Clinical Notes via Automatic Weak Supervision
- URL: http://arxiv.org/abs/2206.12088v1
- Date: Fri, 24 Jun 2022 05:55:49 GMT
- Title: Classifying Unstructured Clinical Notes via Automatic Weak Supervision
- Authors: Chufan Gao, Mononito Goswami, Jieshi Chen, and Artur Dubrawski
- Abstract summary: We introduce a general weakly-supervised text classification framework that learns from class-label descriptions only.
We leverage the linguistic domain knowledge stored within pre-trained language models and the data programming framework to assign code labels to texts.
- Score: 17.45660355026785
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Healthcare providers usually record detailed notes of the clinical care
delivered to each patient for clinical, research, and billing purposes. Due to
the unstructured nature of these narratives, providers employ dedicated staff
to assign diagnostic codes to patients' diagnoses using the International
Classification of Diseases (ICD) coding system. This manual process is not only
time-consuming but also costly and error-prone. Prior work demonstrated
potential utility of Machine Learning (ML) methodology in automating this
process, but it has relied on large quantities of manually labeled data to
train the models. Additionally, diagnostic coding systems evolve with time,
which makes traditional supervised learning strategies unable to generalize
beyond local applications. In this work, we introduce a general
weakly-supervised text classification framework that learns from class-label
descriptions only, without the need to use any human-labeled documents. It
leverages the linguistic domain knowledge stored within pre-trained language
models and the data programming framework to assign code labels to individual
texts. We demonstrate the efficacy and flexibility of our method by comparing
it to state-of-the-art weak text classifiers across four real-world text
classification datasets, in addition to assigning ICD codes to medical notes in
the publicly available MIMIC-III database.
Related papers
- UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - Automatic Coding at Scale: Design and Deployment of a Nationwide System
for Normalizing Referrals in the Chilean Public Healthcare System [0.0]
We propose a two-step system for automatically coding diseases in referrals from the Chilean public healthcare system.
Specifically, our model uses a state-of-the-art NER model for recognizing disease mentions and a search engine system based on for assigning the most relevant codes associated with these disease mentions.
Our system obtained a MAP score of 0.63 for the subcategory level and 0.83 for the category level, close to the best-performing models in the literature.
arXiv Detail & Related papers (2023-07-09T16:19:35Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - A Systematic Literature Review of Automated ICD Coding and
Classification Systems using Discharge Summaries [5.156484100374058]
Codification of free-text clinical narratives has long been recognised to be beneficial for secondary uses such as funding, insurance claim processing and research.
The current scenario of assigning codes is a manual process which is very expensive, time-consuming and error prone.
This systematic literature review provides a comprehensive overview of automated clinical coding systems.
arXiv Detail & Related papers (2021-07-12T03:55:17Z) - Active learning for medical code assignment [55.99831806138029]
We demonstrate the effectiveness of Active Learning (AL) in multi-label text classification in the clinical domain.
We apply a set of well-known AL methods to help automatically assign ICD-9 codes on the MIMIC-III dataset.
Our results show that the selection of informative instances provides satisfactory classification with a significantly reduced training set.
arXiv Detail & Related papers (2021-04-12T18:11:17Z) - Does the Magic of BERT Apply to Medical Code Assignment? A Quantitative
Study [2.871614744079523]
It is not clear if pretrained models are useful for medical code prediction without further architecture engineering.
We propose a hierarchical fine-tuning architecture to capture interactions between distant words and adopt label-wise attention to exploit label information.
Contrary to current trends, we demonstrate that a carefully trained classical CNN outperforms attention-based models on a MIMIC-III subset with frequent codes.
arXiv Detail & Related papers (2021-03-11T07:23:45Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Automated Coding of Under-Studied Medical Concept Domains: Linking
Physical Activity Reports to the International Classification of Functioning,
Disability, and Health [22.196642357767338]
Many domains of medical concepts lack well-developed terminologies that can support effective coding of medical text.
We present a framework for developing natural language processing (NLP) technologies for automated coding of under-studied types of medical information.
arXiv Detail & Related papers (2020-11-27T20:02:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.