Question Answering for Complex Electronic Health Records Database using
Unified Encoder-Decoder Architecture
- URL: http://arxiv.org/abs/2111.14703v1
- Date: Sun, 14 Nov 2021 05:01:38 GMT
- Title: Question Answering for Complex Electronic Health Records Database using
Unified Encoder-Decoder Architecture
- Authors: Seongsu Bae, Daeyoung Kim, Jiho Kim, Edward Choi
- Abstract summary: We design UniQA, a unified-decoder architecture for EHR-QA where natural language questions are converted to queries such as SPARQL.
We also propose input masking (IM), a simple and effective method to cope with complex medical terms and various typos and better learn the SPARQL syntax.
UniQA demonstrated a significant performance improvement against the previous state-of-the-art model for MIMIC* (14.2% gain), the most complex NLQ2 dataset in the EHR domain, and its typo-ridden versions.
- Score: 8.656936724622145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An intelligent machine that can answer human questions based on electronic
health records (EHR-QA) has a great practical value, such as supporting
clinical decisions, managing hospital administration, and medical chatbots.
Previous table-based QA studies focusing on translating natural questions into
table queries (NLQ2SQL), however, suffer from the unique nature of EHR data due
to complex and specialized medical terminology, hence increased decoding
difficulty. In this paper, we design UniQA, a unified encoder-decoder
architecture for EHR-QA where natural language questions are converted to
queries such as SQL or SPARQL. We also propose input masking (IM), a simple and
effective method to cope with complex medical terms and various typos and
better learn the SQL/SPARQL syntax. Combining the unified architecture with an
effective auxiliary training objective, UniQA demonstrated a significant
performance improvement against the previous state-of-the-art model for
MIMICSQL* (14.2% gain), the most complex NLQ2SQL dataset in the EHR domain, and
its typo-ridden versions (approximately 28.8% gain). In addition, we confirmed
consistent results for the graph-based EHR-QA dataset, MIMICSPARQL*.
Related papers
- Effective Instruction Parsing Plugin for Complex Logical Query Answering on Knowledge Graphs [51.33342412699939]
Knowledge Graph Query Embedding (KGQE) aims to embed First-Order Logic (FOL) queries in a low-dimensional KG space for complex reasoning over incomplete KGs.
Recent studies integrate various external information (such as entity types and relation context) to better capture the logical semantics of FOL queries.
We propose an effective Query Instruction Parsing (QIPP) that captures latent query patterns from code-like query instructions.
arXiv Detail & Related papers (2024-10-27T03:18:52Z) - LG AI Research & KAIST at EHRSQL 2024: Self-Training Large Language Models with Pseudo-Labeled Unanswerable Questions for a Reliable Text-to-SQL System on EHRs [58.59113843970975]
Text-to-answer models are pivotal for making Electronic Health Records accessible to healthcare professionals without knowledge.
We present a self-training strategy using pseudo-labeled un-answerable questions to enhance the reliability of text-to-answer models for EHRs.
arXiv Detail & Related papers (2024-05-18T03:25:44Z) - KET-QA: A Dataset for Knowledge Enhanced Table Question Answering [63.56707527868466]
We propose to use a knowledge base (KB) as the external knowledge source for TableQA.
Every question requires the integration of information from both the table and the sub-graph to be answered.
We design a retriever-reasoner structured pipeline model to extract pertinent information from the vast knowledge sub-graph.
arXiv Detail & Related papers (2024-05-13T18:26:32Z) - Retrieval augmented text-to-SQL generation for epidemiological question answering using electronic health records [0.6138671548064356]
We introduce an end-to-end methodology that combines text-to-generation with retrieval augmented generation (RAG) to answer epidemiological questions.
RAG offers a promising direction for improving their capabilities, as shown in a realistic industry setting.
arXiv Detail & Related papers (2024-03-14T09:45:05Z) - EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records [36.213730355895805]
The utterances were collected from 222 hospital staff members, including physicians, nurses, and insurance review and health records teams.
We manually linked these questions to two open-source EHR databases, MIMIC-III and eICU, and included various time expressions and held-out unanswerable questions in the dataset.
arXiv Detail & Related papers (2023-01-16T05:10:20Z) - DrugEHRQA: A Question Answering Dataset on Structured and Unstructured
Electronic Health Records For Medicine Related Queries [7.507210439502174]
This paper develops the first question answering dataset (DrugEHRQA) containing question-answer pairs from both structured tables and unstructured notes from an EHR.
Our dataset has medication-related queries, containing over 70,000 question-answer pairs.
arXiv Detail & Related papers (2022-05-03T03:50:50Z) - Uncertainty-Aware Text-to-Program for Question Answering on Structured
Electronic Health Records [8.272573489245717]
We design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction.
We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner.
For a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question.
arXiv Detail & Related papers (2022-03-14T08:12:16Z) - Knowledge Graph-based Question Answering with Electronic Health Records [30.901617020638124]
Question Answering (QA) is a widely-used framework for developing and evaluating an intelligent machine.
This paper proposes a graph-based EHR QA where natural language queries are converted to SPARQL.
All datasets are open-sourced to encourage further EHR QA research in both directions.
arXiv Detail & Related papers (2020-10-19T11:31:20Z) - Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex
Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam.
These questions are the most challenging for current QA systems.
We present a Multi-step reasoning with Knowledge extraction framework (MurKe)
We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.