Question-Driven Summarization of Answers to Consumer Health Questions
- URL: http://arxiv.org/abs/2005.09067v2
- Date: Wed, 20 May 2020 14:18:05 GMT
- Title: Question-Driven Summarization of Answers to Consumer Health Questions
- Authors: Max Savery, Asma Ben Abacha, Soumya Gayen, Dina Demner-Fushman
- Abstract summary: We present the MEDIQA Answer Summarization dataset.
This dataset is the first summarization collection containing question-driven summaries of answers to consumer health questions.
We include results of baseline and state-of-the-art deep learning summarization models.
- Score: 17.732729654047983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic summarization of natural language is a widely studied area in
computer science, one that is broadly applicable to anyone who routinely needs
to understand large quantities of information. For example, in the medical
domain, recent developments in deep learning approaches to automatic
summarization have the potential to make health information more easily
accessible to patients and consumers. However, to evaluate the quality of
automatically generated summaries of health information, gold-standard, human
generated summaries are required. Using answers provided by the National
Library of Medicine's consumer health question answering system, we present the
MEDIQA Answer Summarization dataset, the first summarization collection
containing question-driven summaries of answers to consumer health questions.
This dataset can be used to evaluate single or multi-document summaries
generated by algorithms using extractive or abstractive approaches. In order to
benchmark the dataset, we include results of baseline and state-of-the-art deep
learning summarization models, demonstrating that this dataset can be used to
effectively evaluate question-driven machine-generated summaries and promote
further machine learning research in medical question answering.
Related papers
- Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Aspect-oriented Consumer Health Answer Summarization [2.298110639419913]
Community Question-Answering (CQA) forums have revolutionized how people seek information, especially those related to their healthcare needs.
There can be several answers in response to a single query, which makes it hard to grasp the key information related to the specific health concern.
Our research focuses on aspect-based summarization of health answers to address this limitation.
arXiv Detail & Related papers (2024-05-10T07:52:43Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - MedInsight: A Multi-Source Context Augmentation Framework for Generating
Patient-Centric Medical Responses using Large Language Models [3.0874677990361246]
Large Language Models (LLMs) have shown impressive capabilities in generating human-like responses.
We propose MedInsight:a novel retrieval framework that augments LLM inputs with relevant background information.
Experiments on the MTSamples dataset validate MedInsight's effectiveness in generating contextually appropriate medical responses.
arXiv Detail & Related papers (2024-03-13T15:20:30Z) - Medical Question Summarization with Entity-driven Contrastive Learning [12.008269098530386]
This paper proposes a novel medical question summarization framework using entity-driven contrastive learning (ECL)
ECL employs medical entities in frequently asked questions (FAQs) as focuses and devises an effective mechanism to generate hard negative samples.
We find that some MQA datasets suffer from serious data leakage problems, such as the iCliniq dataset's 33% duplicate rate.
arXiv Detail & Related papers (2023-04-15T00:19:03Z) - MQAG: Multiple-choice Question Answering and Generation for Assessing
Information Consistency in Summarization [55.60306377044225]
State-of-the-art summarization systems can generate highly fluent summaries.
These summaries, however, may contain factual inconsistencies and/or information not present in the source.
We introduce an alternative scheme based on standard information-theoretic measures in which the information present in the source and summary is directly compared.
arXiv Detail & Related papers (2023-01-28T23:08:25Z) - CHQ-Summ: A Dataset for Consumer Healthcare Question Summarization [21.331145794496774]
We introduce a new dataset, CHQ-Summ, that contains 1507 domain-expert annotated consumer health questions and corresponding summaries.
The dataset is derived from the community question-answering forum.
We benchmark the dataset on multiple state-of-the-art summarization models to show the effectiveness of the dataset.
arXiv Detail & Related papers (2022-06-14T03:49:03Z) - AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer
Summarization [73.91543616777064]
Community Question Answering (CQA) fora such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of community-based questions.
One goal of answer summarization is to produce a summary that reflects the range of answer perspectives.
This work introduces a novel dataset of 4,631 CQA threads for answer summarization, curated by professional linguists.
arXiv Detail & Related papers (2021-11-11T21:48:02Z) - The Medkit-Learn(ing) Environment: Medical Decision Modelling through
Simulation [81.72197368690031]
We present a new benchmarking suite designed specifically for medical sequential decision making.
The Medkit-Learn(ing) Environment is a publicly available Python package providing simple and easy access to high-fidelity synthetic medical data.
arXiv Detail & Related papers (2021-06-08T10:38:09Z) - Question-aware Transformer Models for Consumer Health Question
Summarization [20.342580435464072]
We develop an abstractive question summarization model that leverages the semantic interpretation of a question via recognition of medical entities.
When evaluated on the MeQSum benchmark corpus, our framework outperformed the state-of-the-art method by 10.2 ROUGE-L points.
arXiv Detail & Related papers (2021-06-01T04:21:31Z) - A Revised Generative Evaluation of Visual Dialogue [80.17353102854405]
We propose a revised evaluation scheme for the VisDial dataset.
We measure consensus between answers generated by the model and a set of relevant answers.
We release these sets and code for the revised evaluation scheme as DenseVisDial.
arXiv Detail & Related papers (2020-04-20T13:26:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.