Related papers: Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis

Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis

URL: http://arxiv.org/abs/2507.01053v2
Date: Wed, 30 Jul 2025 13:27:00 GMT
Title: Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis
Authors: Rafi Al Attrach, Pedro Moreira, Rajna Fani, Renato Umeton, Leo Anthony Celi,
Abstract summary: Medical Information Mart for Intensive Care (MIMIC-IV) is the world's largest open-source EHR database.<n>M3 lets researchers converse with the database in plain English.
Score: 1.3984139865709486
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As ever-larger clinical datasets become available, they have the potential to unlock unprecedented opportunities for medical research. Foremost among them is Medical Information Mart for Intensive Care (MIMIC-IV), the world's largest open-source EHR database. However, the inherent complexity of these datasets, particularly the need for sophisticated querying skills and the need to understand the underlying clinical settings, often presents a significant barrier to their effective use. M3 lowers the technical barrier to understanding and querying MIMIC-IV data. With a single command it retrieves MIMIC-IV from PhysioNet, launches a local SQLite instance (or hooks into the hosted BigQuery), and-via the Model Context Protocol (MCP)-lets researchers converse with the database in plain English. Ask a clinical question in natural language; M3 uses a language model to translate it into SQL, executes the query against the MIMIC-IV dataset, and returns structured results alongside the underlying query for verifiability and reproducibility. Demonstrations show that minutes of dialogue with M3 yield the kind of nuanced cohort analyses that once demanded hours of handcrafted SQL and relied on understanding the complexities of clinical workflows. By simplifying access, M3 invites the broader research community to mine clinical critical-care data and accelerates the translation of raw records into actionable insight.

Related papers

Harnessing Large Language Models for Precision Querying and Retrieval-Augmented Knowledge Extraction in Clinical Data Science [3.4325249294405555]
This study applies Large Language Models (LLMs) to two foundational Electronic Health Record (EHR) data science tasks.<n>We test the ability of LLMs to interact accurately with large structured datasets for analytics.<n>We present a flexible evaluation framework that automatically generates synthetic question and answer pairs tailored to the characteristics of each dataset or task.
arXiv Detail & Related papers (2026-01-28T14:57:36Z)
Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL [63.578576078216976]
CLIN is a benchmark of 633 expert-annotated tasks on MIMICIV v3.1.<n>We evaluate 22 proprietary and open-source models under Chain-of-Thought self-refinement.<n>Despite recent advances, performance remains far from clinical reliability.
arXiv Detail & Related papers (2026-01-14T21:12:06Z)
Unlocking Electronic Health Records: A Hybrid Graph RAG Approach to Safe Clinical AI for Patient QA [1.9615061725959186]
Large Language Models offer transformative potential for data processing, but face limitations in clinical settings.<n>Current solutions typically isolate retrieval methods focusing on structured data (Text2Cypher) or unstructured semantic search but fail to integrate both simultaneously.<n>This work presents MediGRAF, a novel hybrid Graph RAG system that bridges this gap.
arXiv Detail & Related papers (2025-11-27T16:08:22Z)
Enhancing the Medical Context-Awareness Ability of LLMs via Multifaceted Self-Refinement Learning [49.559151128219725]
Large language models (LLMs) have shown great promise in the medical domain, achieving strong performance on several benchmarks.<n>However, they continue to underperform in real-world medical scenarios, which often demand stronger context-awareness.<n>We propose Multifaceted Self-Refinement (MuSeR), a data-driven approach that enhances LLMs' context-awareness along three key facets.
arXiv Detail & Related papers (2025-11-13T08:13:23Z)
Reliable Curation of EHR Dataset via Large Language Models under Environmental Constraints [11.502074619844125]
CELEC is a large language model (LLM)-powered framework for automated EHR data extraction and analytics.<n>On a subset of the EHR benchmark, CELEC execution accuracy achieves while maintaining low latency, cost efficiency, and strict privacy.
arXiv Detail & Related papers (2025-11-02T02:45:54Z)
EHR-MCP: Real-world Evaluation of Clinical Information Retrieval by Large Language Models via Model Context Protocol [0.0]
Large language models (LLMs) show promise in medicine, but their deployment in hospitals is limited by restricted access to electronic health record (EHR) systems.<n>The Model Context Protocol (MCP) enables integration between LLMs and external tools.<n>We developed EHR-MCP, a framework of custom MCP tools integrated with the hospital EHR database, and used GPT-4.1 through a LangGraph ReAct agent to interact with it.
arXiv Detail & Related papers (2025-09-19T13:17:16Z)
A Dataset for Addressing Patient's Information Needs related to Clinical Course of Hospitalization [15.837772594006038]
ArchEHR-QA is an expert-annotated dataset based on real-world patient cases from intensive care unit and emergency department settings.<n>Cases comprise questions posed by patients to public health forums, clinician-interpreted counterparts, relevant clinical note excerpts with sentence-level relevance annotations, and clinician-authored answers.<n>The answer-first prompting approach consistently performed best, with Llama 4 achieving the highest scores.
arXiv Detail & Related papers (2025-06-04T16:55:08Z)
RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering [11.214912072391108]
Real-world datasets often feature a vast array of attributes and complex values. Traditional methods cannot fully relay the datasets size and complexity to the Large Language Models. We propose a novel framework that leverages Full-Text Search (FTS) on the input table.
arXiv Detail & Related papers (2024-08-22T13:13:06Z)
UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics. We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z)
Retrieval augmented text-to-SQL generation for epidemiological question answering using electronic health records [0.6138671548064356]
We introduce an end-to-end methodology that combines text-to-generation with retrieval augmented generation (RAG) to answer epidemiological questions. RAG offers a promising direction for improving their capabilities, as shown in a realistic industry setting.
arXiv Detail & Related papers (2024-03-14T09:45:05Z)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z)
Prompting Large Language Models for Zero-Shot Clinical Prediction with Structured Longitudinal Electronic Health Record Data [7.815738943706123]
Large Language Models (LLMs) are traditionally tailored for natural language processing. This research investigates the adaptability of LLMs, like GPT-4, to EHR data. In response to the longitudinal, sparse, and knowledge-infused nature of EHR data, our prompting approach involves taking into account specific characteristics.
arXiv Detail & Related papers (2024-01-25T20:14:50Z)
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval [56.65147231836708]
We develop SWIM-IR, a synthetic retrieval training dataset containing 33 languages for fine-tuning multilingual dense retrievers. SAP assists the large language model (LLM) in generating informative queries in the target language. Our models, called SWIM-X, are competitive with human-supervised dense retrieval models.
arXiv Detail & Related papers (2023-11-10T00:17:10Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Question Answering for Complex Electronic Health Records Database using Unified Encoder-Decoder Architecture [8.656936724622145]
We design UniQA, a unified-decoder architecture for EHR-QA where natural language questions are converted to queries such as SPARQL. We also propose input masking (IM), a simple and effective method to cope with complex medical terms and various typos and better learn the SPARQL syntax. UniQA demonstrated a significant performance improvement against the previous state-of-the-art model for MIMIC* (14.2% gain), the most complex NLQ2 dataset in the EHR domain, and its typo-ridden versions.
arXiv Detail & Related papers (2021-11-14T05:01:38Z)
VBridge: Connecting the Dots Between Features, Explanations, and Data for Healthcare Models [85.4333256782337]
VBridge is a visual analytics tool that seamlessly incorporates machine learning explanations into clinicians' decision-making workflow. We identified three key challenges, including clinicians' unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence. We demonstrated the effectiveness of VBridge through two case studies and expert interviews with four clinicians.
arXiv Detail & Related papers (2021-08-04T17:34:13Z)
Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching. We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.