Related papers: Give me Some Hard Questions: Synthetic Data Generation for Clinical QA

Give me Some Hard Questions: Synthetic Data Generation for Clinical QA

URL: http://arxiv.org/abs/2412.04573v1
Date: Thu, 05 Dec 2024 19:35:41 GMT
Title: Give me Some Hard Questions: Synthetic Data Generation for Clinical QA
Authors: Fan Bai, Keith Harrigian, Joel Stremmel, Hamid Hassanzadeh, Ardavan Saeedi, Mark Dredze,
Abstract summary: This paper explores generating Clinical QA data using large language models (LLMs) in a zero-shot setting.<n>We find that naive prompting often results in easy questions that do not reflect the complexity of clinical scenarios.<n>Experiments on two Clinical QA datasets demonstrate that our method generates more challenging questions, significantly improving fine-tuning performance over baselines.
Score: 13.436187152293515
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clinical Question Answering (QA) systems enable doctors to quickly access patient information from electronic health records (EHRs). However, training these systems requires significant annotated data, which is limited due to the expertise needed and the privacy concerns associated with clinical data. This paper explores generating Clinical QA data using large language models (LLMs) in a zero-shot setting. We find that naive prompting often results in easy questions that do not reflect the complexity of clinical scenarios. To address this, we propose two prompting strategies: 1) instructing the model to generate questions that do not overlap with the input context, and 2) summarizing the input record using a predefined schema to scaffold question generation. Experiments on two Clinical QA datasets demonstrate that our method generates more challenging questions, significantly improving fine-tuning performance over baselines. We compare synthetic and gold data and find a gap between their training efficacy resulting from the quality of synthetically generated answers.

Related papers

A Dataset for Addressing Patient's Information Needs related to Clinical Course of Hospitalization [15.837772594006038]
ArchEHR-QA is an expert-annotated dataset based on real-world patient cases from intensive care unit and emergency department settings.<n>Cases comprise questions posed by patients to public health forums, clinician-interpreted counterparts, relevant clinical note excerpts with sentence-level relevance annotations, and clinician-authored answers.<n>The answer-first prompting approach consistently performed best, with Llama 4 achieving the highest scores.
arXiv Detail & Related papers (2025-06-04T16:55:08Z)
RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions [3.182594503527438]
We present RealMedQA, a dataset of realistic clinical questions generated by humans and an LLM. We show that the LLM is more cost-efficient for generating "ideal" QA pairs.
arXiv Detail & Related papers (2024-08-16T09:32:43Z)
TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [57.067409211231244]
This paper presents meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design. We provide basic validation methods for each task to ensure the datasets' usability and reliability. We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z)
Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. We then construct six novel datasets and clinical tasks that are complex but common in real-world practice. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z)
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions [19.436999992810797]
We construct two new datasets: JAMA Clinical Challenge and Medbullets. JAMA Clinical Challenge consists of questions based on challenging clinical cases, while Medbullets comprises simulated clinical questions. We evaluate seven LLMs on the two datasets using various prompts.
arXiv Detail & Related papers (2024-02-28T05:44:41Z)
K-QA: A Real-World Medical Q&A Benchmark [12.636564634626422]
We construct K-QA, a dataset containing 1,212 patient questions originating from real-world conversations held on K Health. We employ a panel of in-house physicians to answer and manually decompose a subset of K-QA into self-contained statements. We evaluate several state-of-the-art models, as well as the effect of in-context learning and medically-oriented augmented retrieval schemes.
arXiv Detail & Related papers (2024-01-25T20:11:04Z)
Clairvoyance: A Pipeline Toolkit for Medical Time Series [95.22483029602921]
Time-series learning is the bread and butter of data-driven *clinical decision support* Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a software toolkit. Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.
arXiv Detail & Related papers (2023-10-28T12:08:03Z)
RECAP-KG: Mining Knowledge Graphs from Raw GP Notes for Remote COVID-19 Assessment in Primary Care [45.43645878061283]
We present a framework that performs knowledge graph construction from raw GP medical notes written during or after patient consultations. Our knowledge graphs include information about existing patient symptoms, their duration, and their severity. We apply our framework to consultation notes of COVID-19 patients in the UK.
arXiv Detail & Related papers (2023-06-17T23:35:51Z)
Medical Question Summarization with Entity-driven Contrastive Learning [12.008269098530386]
This paper proposes a novel medical question summarization framework using entity-driven contrastive learning (ECL) ECL employs medical entities in frequently asked questions (FAQs) as focuses and devises an effective mechanism to generate hard negative samples. We find that some MQA datasets suffer from serious data leakage problems, such as the iCliniq dataset's 33% duplicate rate.
arXiv Detail & Related papers (2023-04-15T00:19:03Z)
SPeC: A Soft Prompt-Based Calibration on Performance Variability of Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization. Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z)
Learning to Ask Like a Physician [24.15961995052862]
We present Discharge Summary Clinical Questions (DiSCQ), a newly curated question dataset composed of 2,000+ questions. The questions are generated by medical experts from 100+ MIMIC-III discharge summaries. We analyze this dataset to characterize the types of information sought by medical experts.
arXiv Detail & Related papers (2022-06-06T15:50:54Z)
Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching. We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z)
Where's the Question? A Multi-channel Deep Convolutional Neural Network for Question Identification in Textual Data [83.89578557287658]
We propose a novel multi-channel deep convolutional neural network architecture, namely Quest-CNN, for the purpose of separating real questions. We conducted a comprehensive performance comparison analysis of the proposed network against other deep neural networks. The proposed Quest-CNN achieved the best F1 score both on a dataset of data entry-review dialogue in a dialysis care setting, and on a general domain dataset.
arXiv Detail & Related papers (2020-10-15T15:11:22Z)
DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment Prediction [67.91606509226132]
Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment. DeepEnroll is a cross-modal inference learning model to jointly encode enrollment criteria (tabular data) into a shared latent space for matching inference.
arXiv Detail & Related papers (2020-01-22T17:51:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.