An Open Natural Language Processing Development Framework for EHR-based
Clinical Research: A case demonstration using the National COVID Cohort
Collaborative (N3C)
- URL: http://arxiv.org/abs/2110.10780v1
- Date: Wed, 20 Oct 2021 21:09:41 GMT
- Title: An Open Natural Language Processing Development Framework for EHR-based
Clinical Research: A case demonstration using the National COVID Cohort
Collaborative (N3C)
- Authors: Sijia Liu, Andrew Wen, Liwei Wang, Huan He, Sunyang Fu, Robert Miller,
Andrew Williams, Daniel Harris, Ramakanth Kavuluru, Mei Liu, Noor Abu-el-rub,
Rui Zhang, John D. Osborne, Masoud Rouhizadeh, Yongqun He, Emily Pfaff,
Christopher G. Chute, Tim Duong, Melissa A. Haendel, Rafael Fuentes, Peter
Szolovits, Hua Xu, Hongfang Liu (N3C Natural Language Processing (NLP)
Subgroup)
- Abstract summary: We propose an open natural language processing development framework and evaluate it through the implementation of NLP algorithms for the National COVID Cohort Collaborative (N3C)
Based on the interests in information extraction from COVID-19 related clinical notes, our work includes 1) an open data annotation process using COVID-19 signs and symptoms as the use case, 2) a community-driven ruleset composing platform, and 3) a synthetic text data generation workflow to generate texts for information extraction tasks without involving human subjects.
- Score: 29.701601520785033
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While we pay attention to the latest advances in clinical natural language
processing (NLP), we can notice some resistance in the clinical and
translational research community to adopt NLP models due to limited
transparency, Interpretability and usability. Built upon our previous work, in
this study, we proposed an open natural language processing development
framework and evaluated it through the implementation of NLP algorithms for the
National COVID Cohort Collaborative (N3C). Based on the interests in
information extraction from COVID-19 related clinical notes, our work includes
1) an open data annotation process using COVID-19 signs and symptoms as the use
case, 2) a community-driven ruleset composing platform, and 3) a synthetic text
data generation workflow to generate texts for information extraction tasks
without involving human subjects. The generated corpora derived out of the
texts from multiple intuitions and gold standard annotation are tested on a
single institution's rule set has the performances in F1 score of 0.876, 0.706
and 0.694, respectively. The study as a consortium effort of the N3C NLP
subgroup demonstrates the feasibility of creating a federated NLP algorithm
development and benchmarking platform to enhance multi-institution clinical NLP
study.
Related papers
- A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data
Generation with Large Language Models [48.07083163501746]
Clinical natural language processing requires methods that can address domain-specific challenges.
We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process.
Our empirical study across 7 clinical NLP tasks and 16 datasets reveals that ClinGen consistently enhances performance across various tasks.
arXiv Detail & Related papers (2023-11-01T04:37:28Z) - Multi-Site Clinical Federated Learning using Recursive and Attentive
Models and NVFlare [13.176351544342735]
This paper develops an integrated framework that addresses data privacy and regulatory compliance challenges.
It includes the development of an integrated framework that addresses data privacy and regulatory compliance challenges while maintaining elevated accuracy and substantiating the efficacy of the proposed approach.
arXiv Detail & Related papers (2023-06-28T17:00:32Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - A Unified Framework of Medical Information Annotation and Extraction for
Chinese Clinical Text [1.4841452489515765]
Current state-of-the-art (SOTA) NLP models are highly integrated with deep learning techniques.
This study presents an engineering framework of medical entity recognition, relation extraction and attribute extraction.
arXiv Detail & Related papers (2022-03-08T03:19:16Z) - WANLI: Worker and AI Collaboration for Natural Language Inference
Dataset Creation [101.00109827301235]
We introduce a novel paradigm for dataset creation based on human and machine collaboration.
We use dataset cartography to automatically identify examples that demonstrate challenging reasoning patterns, and instruct GPT-3 to compose new examples with similar patterns.
The resulting dataset, WANLI, consists of 108,357 natural language inference (NLI) examples that present unique empirical strengths.
arXiv Detail & Related papers (2022-01-16T03:13:49Z) - FedNLP: A Research Platform for Federated Learning in Natural Language
Processing [55.01246123092445]
We present the FedNLP, a research platform for federated learning in NLP.
FedNLP supports various popular task formulations in NLP such as text classification, sequence tagging, question answering, seq2seq generation, and language modeling.
Preliminary experiments with FedNLP reveal that there exists a large performance gap between learning on decentralized and centralized datasets.
arXiv Detail & Related papers (2021-04-18T11:04:49Z) - Improving Clinical Document Understanding on COVID-19 Research with
Spark NLP [0.0]
Following the global COVID-19 pandemic, the number of scientific papers studying the virus has grown massively.
We present a clinical text mining system that improves on previous efforts in three ways.
First, it can recognize over 100 different entity types including social determinants of health, anatomy, risk factors, and adverse events.
Second, the text processing pipeline includes assertion status detection, to distinguish between clinical facts that are present, absent, conditional, or about someone other than the patient.
arXiv Detail & Related papers (2020-12-07T19:17:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.