EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records
- URL: http://arxiv.org/abs/2301.07695v5
- Date: Mon, 25 Dec 2023 07:12:53 GMT
- Title: EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records
- Authors: Gyubok Lee, Hyeonji Hwang, Seongsu Bae, Yeonsu Kwon, Woncheol Shin,
Seongjun Yang, Minjoon Seo, Jong-Yeup Kim, Edward Choi
- Abstract summary: The utterances were collected from 222 hospital staff members, including physicians, nurses, and insurance review and health records teams.
We manually linked these questions to two open-source EHR databases, MIMIC-III and eICU, and included various time expressions and held-out unanswerable questions in the dataset.
- Score: 36.213730355895805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a new text-to-SQL dataset for electronic health records (EHRs).
The utterances were collected from 222 hospital staff members, including
physicians, nurses, and insurance review and health records teams. To construct
the QA dataset on structured EHR data, we conducted a poll at a university
hospital and used the responses to create seed questions. We then manually
linked these questions to two open-source EHR databases, MIMIC-III and eICU,
and included various time expressions and held-out unanswerable questions in
the dataset, which were also collected from the poll. Our dataset poses a
unique set of challenges: the model needs to 1) generate SQL queries that
reflect a wide range of needs in the hospital, including simple retrieval and
complex operations such as calculating survival rate, 2) understand various
time expressions to answer time-sensitive questions in healthcare, and 3)
distinguish whether a given question is answerable or unanswerable. We believe
our dataset, EHRSQL, can serve as a practical benchmark for developing and
assessing QA models on structured EHR data and take a step further towards
bridging the gap between text-to-SQL research and its real-life deployment in
healthcare. EHRSQL is available at https://github.com/glee4810/EHRSQL.
Related papers
- Text2SQL is Not Enough: Unifying AI and Databases with TAG [47.45480855418987]
Table-Augmented Generation (TAG) is a paradigm for answering natural language questions over databases.
We develop benchmarks to study the TAG problem and find that standard methods answer no more than 20% of queries correctly.
arXiv Detail & Related papers (2024-08-27T00:50:14Z) - EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records [11.78795632771211]
We introduce EHR-Seq, a novel sequential text-to-sql dataset for EHR databases.
EHR-Seq is the first medical text-to-sql dataset benchmark to include sequential and contextual questions.
Our experiments demonstrate the superiority of a multi-turn approach over a single-turn approach in compositionality.
arXiv Detail & Related papers (2024-05-23T07:14:21Z) - LG AI Research & KAIST at EHRSQL 2024: Self-Training Large Language Models with Pseudo-Labeled Unanswerable Questions for a Reliable Text-to-SQL System on EHRs [58.59113843970975]
Text-to-answer models are pivotal for making Electronic Health Records accessible to healthcare professionals without knowledge.
We present a self-training strategy using pseudo-labeled un-answerable questions to enhance the reliability of text-to-answer models for EHRs.
arXiv Detail & Related papers (2024-05-18T03:25:44Z) - Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health Records [12.692089512684955]
One strategy is to build a question-answering system, possibly leveraging text-to- relational models.
The EHR 2024 shared task aims to advance and promote research in developing a question-answering system for EHRs.
Among more than 100 participants who applied to the shared task, eight teams were formed and completed the entire shared task requirement.
arXiv Detail & Related papers (2024-05-04T04:12:18Z) - Retrieval augmented text-to-SQL generation for epidemiological question answering using electronic health records [0.6138671548064356]
We introduce an end-to-end methodology that combines text-to-generation with retrieval augmented generation (RAG) to answer epidemiological questions.
RAG offers a promising direction for improving their capabilities, as shown in a realistic industry setting.
arXiv Detail & Related papers (2024-03-14T09:45:05Z) - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future
Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases.
Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z) - DrugEHRQA: A Question Answering Dataset on Structured and Unstructured
Electronic Health Records For Medicine Related Queries [7.507210439502174]
This paper develops the first question answering dataset (DrugEHRQA) containing question-answer pairs from both structured tables and unstructured notes from an EHR.
Our dataset has medication-related queries, containing over 70,000 question-answer pairs.
arXiv Detail & Related papers (2022-05-03T03:50:50Z) - Weakly Supervised Text-to-SQL Parsing through Question Decomposition [53.22128541030441]
We take advantage of the recently proposed question meaning representation called QDMR.
Given questions, their QDMR structures (annotated by non-experts or automatically predicted) and the answers, we are able to automatically synthesizesql queries.
Our results show that the weakly supervised models perform competitively with those trained on NL- benchmark data.
arXiv Detail & Related papers (2021-12-12T20:02:42Z) - Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open
Domain Question Answering [78.9863753810787]
A large amount of world's knowledge is stored in structured databases.
query languages can answer questions that require complex reasoning, as well as offering full explainability.
arXiv Detail & Related papers (2021-08-05T22:04:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.