Leveraging deep active learning to identify low-resource mobility
functioning information in public clinical notes
- URL: http://arxiv.org/abs/2311.15946v1
- Date: Mon, 27 Nov 2023 15:53:11 GMT
- Title: Leveraging deep active learning to identify low-resource mobility
functioning information in public clinical notes
- Authors: Tuan-Dung Le, Zhuqi Miao, Samuel Alvarado, Brittany Smith, William
Paiva and Thanh Thieu
- Abstract summary: First public annotated dataset specifically on the Mobility domain of the International Classification of Functioning, Disability and Health (ICF)
We utilize the National NLP Clinical Challenges (n2c2) research dataset to construct a pool of candidate sentences using keyword expansion.
Our final dataset consists of 4,265 sentences with a total of 11,784 entities, including 5,511 Action entities, 5,328 Mobility entities, 306 Assistance entities, and 639 Quantification entities.
- Score: 0.157286095422595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Function is increasingly recognized as an important indicator of whole-person
health, although it receives little attention in clinical natural language
processing research. We introduce the first public annotated dataset
specifically on the Mobility domain of the International Classification of
Functioning, Disability and Health (ICF), aiming to facilitate automatic
extraction and analysis of functioning information from free-text clinical
notes. We utilize the National NLP Clinical Challenges (n2c2) research dataset
to construct a pool of candidate sentences using keyword expansion. Our active
learning approach, using query-by-committee sampling weighted by density
representativeness, selects informative sentences for human annotation. We
train BERT and CRF models, and use predictions from these models to guide the
selection of new sentences for subsequent annotation iterations. Our final
dataset consists of 4,265 sentences with a total of 11,784 entities, including
5,511 Action entities, 5,328 Mobility entities, 306 Assistance entities, and
639 Quantification entities. The inter-annotator agreement (IAA), averaged over
all entity types, is 0.72 for exact matching and 0.91 for partial matching. We
also train and evaluate common BERT models and state-of-the-art Nested NER
models. The best F1 scores are 0.84 for Action, 0.7 for Mobility, 0.62 for
Assistance, and 0.71 for Quantification. Empirical results demonstrate
promising potential of NER models to accurately extract mobility functioning
information from clinical text. The public availability of our annotated
dataset will facilitate further research to comprehensively capture functioning
information in electronic health records (EHRs).
Related papers
- Attribute Structuring Improves LLM-Based Evaluation of Clinical Text
Summaries [62.32403630651586]
Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation.
Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process.
AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization.
arXiv Detail & Related papers (2024-03-01T21:59:03Z) - Low-resource classification of mobility functioning information in
clinical sentences using large language models [0.0]
This study evaluates the ability of publicly available large language models (LLMs) to accurately identify the presence of functioning information from clinical notes.
We collect a balanced binary classification dataset of 1000 sentences from the Mobility NER dataset, which was curated from n2c2 clinical notes.
arXiv Detail & Related papers (2023-12-15T20:59:17Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Natural Language Processing Methods to Identify Oncology Patients at
High Risk for Acute Care with Clinical Notes [9.49721872804122]
This paper evaluates how natural language processing can be used to identify the risk of acute care use (ACU) in oncology patients.
Risk prediction using structured health data (SHD) is now standard, but predictions using free-text formats are complex.
arXiv Detail & Related papers (2022-09-28T06:31:19Z) - A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data
for Interpretable In-Hospital Mortality Prediction [8.625186194860696]
We provide a novel multimodal transformer to fuse clinical notes and structured EHR data for better prediction of in-hospital mortality.
To improve interpretability, we propose an integrated gradients (IG) method to select important words in clinical notes.
We also investigate the significance of domain adaptive pretraining and task adaptive fine-tuning on the Clinical BERT.
arXiv Detail & Related papers (2022-08-09T03:49:52Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - A Unified Framework of Medical Information Annotation and Extraction for
Chinese Clinical Text [1.4841452489515765]
Current state-of-the-art (SOTA) NLP models are highly integrated with deep learning techniques.
This study presents an engineering framework of medical entity recognition, relation extraction and attribute extraction.
arXiv Detail & Related papers (2022-03-08T03:19:16Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z) - Med7: a transferable clinical natural language processing model for
electronic health records [6.935142529928062]
We introduce a named-entity recognition model for clinical natural language processing.
The model is trained to recognise seven categories: drug names, route, frequency, dosage, strength, form, duration.
We evaluate the transferability of the developed model using the data from the Intensive Care Unit in the US to secondary care mental health records (CRIS) in the UK.
arXiv Detail & Related papers (2020-03-03T00:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.