Neural Natural Language Processing for Unstructured Data in Electronic
Health Records: a Review
- URL: http://arxiv.org/abs/2107.02975v1
- Date: Wed, 7 Jul 2021 01:50:02 GMT
- Title: Neural Natural Language Processing for Unstructured Data in Electronic
Health Records: a Review
- Authors: Irene Li, Jessica Pan, Jeremy Goldwasser, Neha Verma, Wai Pan Wong,
Muhammed Yavuz Nuzumlal{\i}, Benjamin Rosand, Yixin Li, Matthew Zhang, David
Chang, R. Andrew Taylor, Harlan M. Krumholz and Dragomir Radev
- Abstract summary: Well over half of the information stored within EHRs is in the form of unstructured text.
Deep learning approaches to Natural Language Processing have made considerable advances.
We focus on a broad scope of tasks, namely, classification and prediction, word embeddings, extraction, generation, and other topics.
- Score: 4.454501609622817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Electronic health records (EHRs), digital collections of patient healthcare
events and observations, are ubiquitous in medicine and critical to healthcare
delivery, operations, and research. Despite this central role, EHRs are
notoriously difficult to process automatically. Well over half of the
information stored within EHRs is in the form of unstructured text (e.g.
provider notes, operation reports) and remains largely untapped for secondary
use. Recently, however, newer neural network and deep learning approaches to
Natural Language Processing (NLP) have made considerable advances,
outperforming traditional statistical and rule-based systems on a variety of
tasks. In this survey paper, we summarize current neural NLP methods for EHR
applications. We focus on a broad scope of tasks, namely, classification and
prediction, word embeddings, extraction, generation, and other topics such as
question answering, phenotyping, knowledge graphs, medical dialogue,
multilinguality, interpretability, etc.
Related papers
- Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - Advancing Italian Biomedical Information Extraction with
Transformers-based Models: Methodological Insights and Multicenter Practical
Application [0.27027468002793437]
Information Extraction can help clinical practitioners overcome the limitation by using automated text-mining pipelines.
We created the first Italian neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to develop a Transformers-based model.
The lessons learned are: (i) the crucial role of a consistent annotation process and (ii) a fine-tuning strategy that combines classical methods with a "low-resource" approach.
arXiv Detail & Related papers (2023-06-08T16:15:46Z) - Mitigating Data Scarcity for Large Language Models [7.259279261659759]
In recent years, pretrained neural language models (PNLMs) have taken the field of natural language processing by storm.
Data scarcity are commonly found in specialized domains, such as medical, or in low-resource languages that are underexplored by AI research.
In this dissertation, we focus on mitigating data scarcity using data augmentation and neural ensemble learning techniques.
arXiv Detail & Related papers (2023-02-03T15:17:53Z) - Toward a Neural Semantic Parsing System for EHR Question Answering [7.784753717089568]
Clinical semantic parsing (SP) is an important step toward identifying the exact information need from a natural language query.
Recent advancements in neural SP show a promise for building a robust and flexible semantic lexicon without much human effort.
arXiv Detail & Related papers (2022-11-08T21:36:22Z) - Towards Structuring Real-World Data at Scale: Deep Learning for
Extracting Key Oncology Information from Clinical Text with Patient-Level
Supervision [10.929271646369887]
The majority of detailed patient information in real-world data (RWD) is only consistently available in free-text clinical documents.
Traditional rule-based systems are vulnerable to the prevalent linguistic variations and ambiguities in clinical text.
We propose leveraging patient-level supervision from medical registries, which are often readily available and capture key patient information.
arXiv Detail & Related papers (2022-03-20T03:42:03Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Multilingual Medical Question Answering and Information Retrieval for
Rural Health Intelligence Access [1.0499611180329804]
In rural regions of several developing countries, access to quality healthcare, medical infrastructure, and professional diagnosis is largely unavailable.
Several deaths resulting from this lack of medical access, absence of patient's previous health records, and the supplanting of information in indigenous languages can be easily prevented.
We describe an approach leveraging the phenomenal progress in Machine Learning and NLP (Natural Language Processing) techniques to design a model that is low-resource, multilingual, and a preliminary first-point-of-contact medical assistant.
arXiv Detail & Related papers (2021-06-02T16:05:24Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.