A Question Answering Based Pipeline for Comprehensive Chinese EHR
Information Extraction
- URL: http://arxiv.org/abs/2402.11177v1
- Date: Sat, 17 Feb 2024 02:55:35 GMT
- Title: A Question Answering Based Pipeline for Comprehensive Chinese EHR
Information Extraction
- Authors: Huaiyuan Ying, Sheng Yu
- Abstract summary: We propose a novel approach that automatically generates training data for transfer learning of question answering models.
Our pipeline incorporates a preprocessing module to handle challenges posed by extraction types.
The obtained QA model exhibits excellent performance on subtasks of information extraction in EHRs.
- Score: 3.411065529290054
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Electronic health records (EHRs) hold significant value for research and
applications. As a new way of information extraction, question answering (QA)
can extract more flexible information than conventional methods and is more
accessible to clinical researchers, but its progress is impeded by the scarcity
of annotated data. In this paper, we propose a novel approach that
automatically generates training data for transfer learning of QA models. Our
pipeline incorporates a preprocessing module to handle challenges posed by
extraction types that are not readily compatible with extractive QA frameworks,
including cases with discontinuous answers and many-to-one relationships. The
obtained QA model exhibits excellent performance on subtasks of information
extraction in EHRs, and it can effectively handle few-shot or zero-shot
settings involving yes-no questions. Case studies and ablation studies
demonstrate the necessity of each component in our design, and the resulting
model is deemed suitable for practical use.
Related papers
- Fine-tuning -- a Transfer Learning approach [0.22344294014777952]
Missingness in Electronic Health Records (EHRs) is often hampered by the abundance of missing data in this valuable resource.
Existing deep imputation methods rely on end-to-end pipelines that incorporate both imputation and downstream analyses.
This paper explores the development of a modular, deep learning-based imputation and classification pipeline.
arXiv Detail & Related papers (2024-11-06T14:18:23Z) - Maximizing Relation Extraction Potential: A Data-Centric Study to Unveil Challenges and Opportunities [3.8087810875611896]
This paper investigates the possible data-centric characteristics that impede neural relation extraction.
It emphasizes pivotal issues, such as contextual ambiguity, correlating relations, long-tail data, and fine-grained relation distributions.
It sets a marker for future directions to alleviate these issues, thereby proving to be a critical resource for novice and advanced researchers.
arXiv Detail & Related papers (2024-09-07T23:40:47Z) - Using Weak Supervision and Data Augmentation in Question Answering [0.12499537119440242]
The onset of the COVID-19 pandemic accentuated the need for access to biomedical literature to answer timely and disease-specific questions.
We explore the roles weak supervision and data augmentation play in training deep neural network QA models.
We evaluate our methods in the context of QA models at the core of a system to answer questions about COVID-19.
arXiv Detail & Related papers (2023-09-28T05:16:51Z) - Fine-tuning and aligning question answering models for complex
information extraction tasks [0.8392546351624164]
extractive language models like question answering (QA) or passage retrieval models guarantee query results to be found within the boundaries of an according context document.
We show that fine-tuning existing German QA models boosts performance for tailored extraction tasks of complex linguistic features.
We deduce a combined metric from Levenshtein distance, F1-Score, Exact Match and ROUGE-L to mimic the assessment criteria from human experts.
arXiv Detail & Related papers (2023-09-26T10:02:21Z) - QontSum: On Contrasting Salient Content for Query-focused Summarization [22.738731393540633]
Query-focused summarization (QFS) is a challenging task in natural language processing that generates summaries to address specific queries.
This paper highlights the role of QFS in Grounded Answer Generation (GAR)
We propose QontSum, a novel approach for QFS that leverages contrastive learning to help the model attend to the most relevant regions of the input document.
arXiv Detail & Related papers (2023-07-14T19:25:35Z) - Questions Are All You Need to Train a Dense Passage Retriever [123.13872383489172]
ART is a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.
It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question.
arXiv Detail & Related papers (2022-06-21T18:16:31Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - Abstractive Query Focused Summarization with Query-Free Resources [60.468323530248945]
In this work, we consider the problem of leveraging only generic summarization resources to build an abstractive QFS system.
We propose Marge, a Masked ROUGE Regression framework composed of a novel unified representation for summaries and queries.
Despite learning from minimal supervision, our system achieves state-of-the-art results in the distantly supervised setting.
arXiv Detail & Related papers (2020-12-29T14:39:35Z) - Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex
Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam.
These questions are the most challenging for current QA systems.
We present a Multi-step reasoning with Knowledge extraction framework (MurKe)
We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.