Knowledge Graph-based Question Answering with Electronic Health Records
- URL: http://arxiv.org/abs/2010.09394v2
- Date: Mon, 2 Aug 2021 07:20:56 GMT
- Title: Knowledge Graph-based Question Answering with Electronic Health Records
- Authors: Junwoo Park, Youngwoo Cho, Haneol Lee, Jaegul Choo, Edward Choi
- Abstract summary: Question Answering (QA) is a widely-used framework for developing and evaluating an intelligent machine.
This paper proposes a graph-based EHR QA where natural language queries are converted to SPARQL.
All datasets are open-sourced to encourage further EHR QA research in both directions.
- Score: 30.901617020638124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Question Answering (QA) is a widely-used framework for developing and
evaluating an intelligent machine. In this light, QA on Electronic Health
Records (EHR), namely EHR QA, can work as a crucial milestone towards
developing an intelligent agent in healthcare. EHR data are typically stored in
a relational database, which can also be converted to a directed acyclic graph,
allowing two approaches for EHR QA: Table-based QA and Knowledge Graph-based
QA. We hypothesize that the graph-based approach is more suitable for EHR QA as
graphs can represent relations between entities and values more naturally
compared to tables, which essentially require JOIN operations. In this paper,
we propose a graph-based EHR QA where natural language queries are converted to
SPARQL instead of SQL. To validate our hypothesis, we create four EHR QA
datasets (graph-based VS table-based, and simplified database schema VS
original database schema), based on a table-based dataset MIMICSQL. We test
both a simple Seq2Seq model and a state-of-the-art EHR QA model on all datasets
where the graph-based datasets facilitated up to 34% higher accuracy than the
table-based dataset without any modification to the model architectures.
Finally, all datasets are open-sourced to encourage further EHR QA research in
both directions.
Related papers
- SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA [25.09488366689108]
Text-to- parsing and end-to-end question answering (E2E TQA) are two main approaches for Table-based Question Answering task.
Despite success on multiple benchmarks, they have yet to be compared and their synergy remains unexplored.
We identify different strengths and weaknesses through evaluating state-of-the-art models on benchmark datasets.
arXiv Detail & Related papers (2024-09-25T07:18:45Z) - KET-QA: A Dataset for Knowledge Enhanced Table Question Answering [63.56707527868466]
We propose to use a knowledge base (KB) as the external knowledge source for TableQA.
Every question requires the integration of information from both the table and the sub-graph to be answered.
We design a retriever-reasoner structured pipeline model to extract pertinent information from the vast knowledge sub-graph.
arXiv Detail & Related papers (2024-05-13T18:26:32Z) - Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA [9.659820850719413]
We leverage Large Language Models (LLMs), which have shown to have strong reasoning ability, as an automatic data annotator.
Key innovation in our method lies in the Synthesize Step-by-Step strategy.
We significantly enhance the chart VQA models, achieving the state-of-the-art accuracy on the ChartQA and PlotQA datasets.
arXiv Detail & Related papers (2024-03-25T03:02:27Z) - Neural Graph Reasoning: Complex Logical Query Answering Meets Graph
Databases [63.96793270418793]
Complex logical query answering (CLQA) is a recently emerged task of graph machine learning.
We introduce the concept of Neural Graph Database (NGDBs)
NGDB consists of a Neural Graph Storage and a Neural Graph Engine.
arXiv Detail & Related papers (2023-03-26T04:03:37Z) - OmniTab: Pretraining with Natural and Synthetic Data for Few-shot
Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort.
We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z) - Uncertainty-Aware Text-to-Program for Question Answering on Structured
Electronic Health Records [8.272573489245717]
We design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction.
We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner.
For a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question.
arXiv Detail & Related papers (2022-03-14T08:12:16Z) - Question-Answer Sentence Graph for Joint Modeling Answer Selection [122.29142965960138]
We train and integrate state-of-the-art (SOTA) models for computing scores between question-question, question-answer, and answer-answer pairs.
Online inference is then performed to solve the AS2 task on unseen queries.
arXiv Detail & Related papers (2022-02-16T05:59:53Z) - Question Answering for Complex Electronic Health Records Database using
Unified Encoder-Decoder Architecture [8.656936724622145]
We design UniQA, a unified-decoder architecture for EHR-QA where natural language questions are converted to queries such as SPARQL.
We also propose input masking (IM), a simple and effective method to cope with complex medical terms and various typos and better learn the SPARQL syntax.
UniQA demonstrated a significant performance improvement against the previous state-of-the-art model for MIMIC* (14.2% gain), the most complex NLQ2 dataset in the EHR domain, and its typo-ridden versions.
arXiv Detail & Related papers (2021-11-14T05:01:38Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.