DrugEHRQA: A Question Answering Dataset on Structured and Unstructured
Electronic Health Records For Medicine Related Queries
- URL: http://arxiv.org/abs/2205.01290v1
- Date: Tue, 3 May 2022 03:50:50 GMT
- Title: DrugEHRQA: A Question Answering Dataset on Structured and Unstructured
Electronic Health Records For Medicine Related Queries
- Authors: Jayetri Bardhan, Anthony Colas, Kirk Roberts, Daisy Zhe Wang
- Abstract summary: This paper develops the first question answering dataset (DrugEHRQA) containing question-answer pairs from both structured tables and unstructured notes from an EHR.
Our dataset has medication-related queries, containing over 70,000 question-answer pairs.
- Score: 7.507210439502174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper develops the first question answering dataset (DrugEHRQA)
containing question-answer pairs from both structured tables and unstructured
notes from a publicly available Electronic Health Record (EHR). EHRs contain
patient records, stored in structured tables and unstructured clinical notes.
The information in structured and unstructured EHRs is not strictly disjoint:
information may be duplicated, contradictory, or provide additional context
between these sources. Our dataset has medication-related queries, containing
over 70,000 question-answer pairs. To provide a baseline model and help analyze
the dataset, we have used a simple model (MultimodalEHRQA) which uses the
predictions of a modality selection network to choose between EHR tables and
clinical notes to answer the questions. This is used to direct the questions to
the table-based or text-based state-of-the-art QA model. In order to address
the problem arising from complex, nested queries, this is the first time
Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers (RAT-SQL)
has been used to test the structure of query templates in EHR data. Our goal is
to provide a benchmark dataset for multi-modal QA systems, and to open up new
avenues of research in improving question answering over EHR structured data by
using context from unstructured clinical data.
Related papers
- EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records [14.69982800306006]
EHRs are integral for storing comprehensive patient medical records, combining structured data (e.g., medications) with detailed clinical notes (e.g., physician notes)
These elements are essential for straightforward data retrieval and provide deep, contextual insights into patient care.
However, they often suffer from discrepancies due to unintuitive EHR system designs and human errors.
We developed EHRCon, a new dataset and task designed to ensure data consistency between structured tables and unstructured notes in EHRs.
arXiv Detail & Related papers (2024-06-24T06:26:50Z) - KET-QA: A Dataset for Knowledge Enhanced Table Question Answering [63.56707527868466]
We propose to use a knowledge base (KB) as the external knowledge source for TableQA.
Every question requires the integration of information from both the table and the sub-graph to be answered.
We design a retriever-reasoner structured pipeline model to extract pertinent information from the vast knowledge sub-graph.
arXiv Detail & Related papers (2024-05-13T18:26:32Z) - Enhancing Open-Domain Table Question Answering via Syntax- and
Structure-aware Dense Retrieval [21.585255812861632]
Open-domain table question answering aims to provide answers to a question by retrieving and extracting information from a large collection of tables.
Existing studies of open-domain table QA either directly adopt text retrieval methods or consider the table structure only in the encoding layer for table retrieval.
We propose a syntax- and structure-aware retrieval method for the open-domain table QA task.
arXiv Detail & Related papers (2023-09-19T10:40:09Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - QUADRo: Dataset and Models for QUestion-Answer Database Retrieval [97.84448420852854]
Given a database (DB) of question/answer (q/a) pairs, it is possible to answer a target question by scanning the DB for similar questions.
We build a large scale DB of 6.3M q/a pairs, using public questions, and design a new system based on neural IR and a q/a pair reranker.
We show that our DB-based approach is competitive with Web-based methods, i.e., a QA system built on top the BING search engine.
arXiv Detail & Related papers (2023-03-30T00:42:07Z) - EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records [36.213730355895805]
The utterances were collected from 222 hospital staff members, including physicians, nurses, and insurance review and health records teams.
We manually linked these questions to two open-source EHR databases, MIMIC-III and eICU, and included various time expressions and held-out unanswerable questions in the dataset.
arXiv Detail & Related papers (2023-01-16T05:10:20Z) - Discourse Analysis via Questions and Answers: Parsing Dependency
Structures of Questions Under Discussion [57.43781399856913]
This work adopts the linguistic framework of Questions Under Discussion (QUD) for discourse analysis.
We characterize relationships between sentences as free-form questions, in contrast to exhaustive fine-grained questions.
We develop the first-of-its-kind QUD that derives a dependency structure of questions over full documents.
arXiv Detail & Related papers (2022-10-12T03:53:12Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - Question Answering for Complex Electronic Health Records Database using
Unified Encoder-Decoder Architecture [8.656936724622145]
We design UniQA, a unified-decoder architecture for EHR-QA where natural language questions are converted to queries such as SPARQL.
We also propose input masking (IM), a simple and effective method to cope with complex medical terms and various typos and better learn the SPARQL syntax.
UniQA demonstrated a significant performance improvement against the previous state-of-the-art model for MIMIC* (14.2% gain), the most complex NLQ2 dataset in the EHR domain, and its typo-ridden versions.
arXiv Detail & Related papers (2021-11-14T05:01:38Z) - FeTaQA: Free-form Table Question Answering [33.018256483762386]
We introduce FeTaQA, a new dataset with 10K Wikipedia-based table, question, free-form answer, supporting table cells pairs.
FeTaQA yields a more challenging table question answering setting because it requires generating free-form text answers after retrieval, inference, and integration of multiple discontinuous facts from a structured knowledge source.
arXiv Detail & Related papers (2021-04-01T09:59:40Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.