Retrieval augmented text-to-SQL generation for epidemiological question answering using electronic health records
- URL: http://arxiv.org/abs/2403.09226v2
- Date: Thu, 16 May 2024 13:00:56 GMT
- Title: Retrieval augmented text-to-SQL generation for epidemiological question answering using electronic health records
- Authors: Angelo Ziletti, Leonardo D'Ambrosi,
- Abstract summary: We introduce an end-to-end methodology that combines text-to-generation with retrieval augmented generation (RAG) to answer epidemiological questions.
RAG offers a promising direction for improving their capabilities, as shown in a realistic industry setting.
- Score: 0.6138671548064356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Electronic health records (EHR) and claims data are rich sources of real-world data that reflect patient health status and healthcare utilization. Querying these databases to answer epidemiological questions is challenging due to the intricacy of medical terminology and the need for complex SQL queries. Here, we introduce an end-to-end methodology that combines text-to-SQL generation with retrieval augmented generation (RAG) to answer epidemiological questions using EHR and claims data. We show that our approach, which integrates a medical coding step into the text-to-SQL process, significantly improves the performance over simple prompting. Our findings indicate that although current language models are not yet sufficiently accurate for unsupervised use, RAG offers a promising direction for improving their capabilities, as shown in a realistic industry setting.
Related papers
- GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval [56.610806615527885]
This paper introduces a novel data-centric approach, Generalized Query Expansion (GQE), to address the inherent information imbalance between text and video.
By adaptively segmenting videos into short clips and employing zero-shot captioning, GQE enriches the training dataset with comprehensive scene descriptions.
GQE achieves state-of-the-art performance on several benchmarks, including MSR-VTT, MSVD, LSMDC, and VATEX.
arXiv Detail & Related papers (2024-08-14T01:24:09Z) - The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation [1.2839205715237014]
Large Language Models (LLMs) have the potential to significantly improve personal health management for chronic conditions.
LLMs generate responses based on patterns learned from diverse internet data.
Retrieval Augmented Generation (RAG) can help mitigate hallucinations and inaccuracies in RAG responses.
arXiv Detail & Related papers (2024-07-25T13:47:01Z) - LG AI Research & KAIST at EHRSQL 2024: Self-Training Large Language Models with Pseudo-Labeled Unanswerable Questions for a Reliable Text-to-SQL System on EHRs [58.59113843970975]
Text-to-answer models are pivotal for making Electronic Health Records accessible to healthcare professionals without knowledge.
We present a self-training strategy using pseudo-labeled un-answerable questions to enhance the reliability of text-to-answer models for EHRs.
arXiv Detail & Related papers (2024-05-18T03:25:44Z) - Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health Records [12.692089512684955]
One strategy is to build a question-answering system, possibly leveraging text-to- relational models.
The EHR 2024 shared task aims to advance and promote research in developing a question-answering system for EHRs.
Among more than 100 participants who applied to the shared task, eight teams were formed and completed the entire shared task requirement.
arXiv Detail & Related papers (2024-05-04T04:12:18Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - Towards Understanding the Generalization of Medical Text-to-SQL Models
and Datasets [46.12592636378064]
We show that there is still a long way to go before solving text-to-generation in the medical domain.
We evaluate state-of-the-art language models showing substantial drops in performance with accuracy dropping from up to 92% to 28%.
We introduce a novel data augmentation approach to improve the generalizability of relational language models.
arXiv Detail & Related papers (2023-03-22T20:26:30Z) - EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records [36.213730355895805]
The utterances were collected from 222 hospital staff members, including physicians, nurses, and insurance review and health records teams.
We manually linked these questions to two open-source EHR databases, MIMIC-III and eICU, and included various time expressions and held-out unanswerable questions in the dataset.
arXiv Detail & Related papers (2023-01-16T05:10:20Z) - Learning Contextualized Document Representations for Healthcare Answer
Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents.
Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse.
We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.