EHRKit: A Python Natural Language Processing Toolkit for Electronic
Health Record Texts
- URL: http://arxiv.org/abs/2204.06604v5
- Date: Wed, 28 Jun 2023 03:03:26 GMT
- Title: EHRKit: A Python Natural Language Processing Toolkit for Electronic
Health Record Texts
- Authors: Irene Li, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin
Rosand, Jeremy Goldwasser, Dragomir Radev
- Abstract summary: We create a python library for clinical texts, EHRKit.
This library contains two main parts: MIMIC-III-specific functions and tasks specific functions.
The first part introduces a list of interfaces for accessing MIMIC-III NOTEEVENTS data, including basic search, information retrieval, and information extraction.
The second part integrates many third-party libraries for up to 12 off-shelf NLP tasks such as named entity recognition, summarization, machine translation, etc.
- Score: 12.10507006658038
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The Electronic Health Record (EHR) is an essential part of the modern medical
system and impacts healthcare delivery, operations, and research. Unstructured
text is attracting much attention despite structured information in the EHRs
and has become an exciting research field. The success of the recent neural
Natural Language Processing (NLP) method has led to a new direction for
processing unstructured clinical notes. In this work, we create a python
library for clinical texts, EHRKit. This library contains two main parts:
MIMIC-III-specific functions and tasks specific functions. The first part
introduces a list of interfaces for accessing MIMIC-III NOTEEVENTS data,
including basic search, information retrieval, and information extraction. The
second part integrates many third-party libraries for up to 12 off-shelf NLP
tasks such as named entity recognition, summarization, machine translation,
etc.
Related papers
- Facilitating phenotyping from clinical texts: the medkit library [1.7924255866089314]
Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition.
Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs.
We developed an open-source Python library named medkit to facilitate the development, evaluation and reproductibility of phenotyping pipelines.
arXiv Detail & Related papers (2024-08-30T16:54:06Z) - CorpusBrain++: A Continual Generative Pre-Training Framework for
Knowledge-Intensive Language Tasks [111.13988772503511]
Knowledge-intensive language tasks (KILTs) typically require retrieving relevant documents from trustworthy corpora, e.g., Wikipedia, to produce specific answers.
Very recently, a pre-trained generative retrieval model for KILTs, named CorpusBrain, was proposed and reached new state-of-the-art retrieval performance.
arXiv Detail & Related papers (2024-02-26T17:35:44Z) - Ascle: A Python Natural Language Processing Toolkit for Medical Text
Generation [30.883733024137506]
Ascle is a pioneering natural language processing (NLP) toolkit designed for medical text generation.
Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution.
arXiv Detail & Related papers (2023-11-28T08:13:29Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - A Marker-based Neural Network System for Extracting Social Determinants
of Health [12.6970199179668]
Social determinants of health (SDoH) on patients' healthcare quality and the disparity is well-known.
Many SDoH items are not coded in structured forms in electronic health records.
We explore a multi-stage pipeline involving named entity recognition (NER), relation classification (RC), and text classification methods to extract SDoH information from clinical notes automatically.
arXiv Detail & Related papers (2022-12-24T18:40:23Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z) - Effidit: Your AI Writing Assistant [60.588370965898534]
Effidit is a digital writing assistant that facilitates users to write higher-quality text more efficiently by using artificial intelligence (AI) technologies.
In Effidit, we significantly expand the capacities of a writing assistant by providing functions in five categories: text completion, error checking, text polishing, keywords to sentences (K2S), and cloud input methods (cloud IME)
arXiv Detail & Related papers (2022-08-03T02:24:45Z) - Hierarchical Annotation for Building A Suite of Clinical Natural
Language Processing Tasks: Progress Note Understanding [4.5939673461957335]
This work introduces a hierarchical annotation schema with three stages to address clinical text understanding, clinical reasoning, and summarization.
We created an annotated corpus based on an extensive collection of publicly available daily progress notes.
We also define a new suite of tasks, Progress Note Understanding, with three tasks utilizing the three annotation stages.
arXiv Detail & Related papers (2022-04-06T18:38:08Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Neural Natural Language Processing for Unstructured Data in Electronic
Health Records: a Review [4.454501609622817]
Well over half of the information stored within EHRs is in the form of unstructured text.
Deep learning approaches to Natural Language Processing have made considerable advances.
We focus on a broad scope of tasks, namely, classification and prediction, word embeddings, extraction, generation, and other topics.
arXiv Detail & Related papers (2021-07-07T01:50:02Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.