SummQA at MEDIQA-Chat 2023:In-Context Learning with GPT-4 for Medical
Summarization
- URL: http://arxiv.org/abs/2306.17384v1
- Date: Fri, 30 Jun 2023 03:14:04 GMT
- Title: SummQA at MEDIQA-Chat 2023:In-Context Learning with GPT-4 for Medical
Summarization
- Authors: Yash Mathur, Sanketh Rangreji, Raghav Kapoor, Medha Palavalli, Amanda
Bertsch, Matthew R. Gormley
- Abstract summary: We present a novel system for the Dialogue2Note Medical Summarization tasks in the MEDIQA 2023 Shared Task.
Our approach for section-wise summarization (Task A) is a two-stage process of selecting semantically similar dialogues and using the top-k similar dialogues as in-context examples for GPT-4.
For full-note summarization (Task B), we use a similar solution with k=1.
- Score: 5.92318236682798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical dialogue summarization is challenging due to the unstructured nature
of medical conversations, the use of medical terminology in gold summaries, and
the need to identify key information across multiple symptom sets. We present a
novel system for the Dialogue2Note Medical Summarization tasks in the MEDIQA
2023 Shared Task. Our approach for section-wise summarization (Task A) is a
two-stage process of selecting semantically similar dialogues and using the
top-k similar dialogues as in-context examples for GPT-4. For full-note
summarization (Task B), we use a similar solution with k=1. We achieved 3rd
place in Task A (2nd among all teams), 4th place in Task B Division Wise
Summarization (2nd among all teams), 15th place in Task A Section Header
Classification (9th among all teams), and 8th place among all teams in Task B.
Our results highlight the effectiveness of few-shot prompting for this task,
though we also identify several weaknesses of prompting-based approaches. We
compare GPT-4 performance with several finetuned baselines. We find that GPT-4
summaries are more abstractive and shorter. We make our code publicly
available.
Related papers
- Two eyes, Two views, and finally, One summary! Towards Multi-modal Multi-tasking Knowledge-Infused Medical Dialogue Summarization [12.953002469651938]
We investigate the effectiveness of a multi-faceted approach that simultaneously produces summaries of medical concerns, doctor impressions, and an overall view.
We introduce a multi-modal, multi-tasking, knowledge-infused medical dialogue summary generation model (MMK-Summation)
The model, MMK-Summation, takes dialogues as input, extracts pertinent external knowledge based on the context, integrates the knowledge and visual cues from the dialogues into the textual content, and ultimately generates concise summaries.
arXiv Detail & Related papers (2024-07-21T18:00:10Z) - Generating medically-accurate summaries of patient-provider dialogue: A
multi-stage approach using large language models [6.252236971703546]
An effective summary is required to be coherent and accurately capture all the medically relevant information in the dialogue.
This paper tackles the problem of medical conversation summarization by discretizing the task into several smaller dialogue-understanding tasks.
arXiv Detail & Related papers (2023-05-10T08:48:53Z) - GersteinLab at MEDIQA-Chat 2023: Clinical Note Summarization from
Doctor-Patient Conversations through Fine-tuning and In-context Learning [4.2570830892708225]
This paper presents our contribution to the MEDIQA-2023 Dialogue2Note shared task, encompassing both subtask A and subtask B.
We approach the task as a dialogue summarization problem and implement two distinct pipelines: (a) a fine-tuning of a pre-trained dialogue summarization model and GPT-3, and (b) few-shot in-context learning (ICL) using a large language model, GPT-4.
Both methods achieve excellent results in terms of ROUGE-1 F1, BERTScore F1 (deberta-xlarge-mnli), and BLEURT
arXiv Detail & Related papers (2023-05-08T19:16:26Z) - DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System
for Multilingual Named Entity Recognition [94.90258603217008]
The MultiCoNER RNum2 shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios.
Previous top systems in the MultiCoNER RNum1 either incorporate the knowledge bases or gazetteers.
We propose a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER.
arXiv Detail & Related papers (2023-05-05T16:59:26Z) - WangLab at MEDIQA-Chat 2023: Clinical Note Generation from
Doctor-Patient Conversations using Large Language Models [2.3608256778747565]
We submit to the MEDIQA-Chat 2023 shared task for automatic clinical note generation from doctor-patient conversations.
We report results for two approaches: the first fine-tunes a pre-trained language model (PLM) on the shared task data, and the second uses few-shot in-context learning (ICL) with a large language model (LLM)
Expert human scrutiny indicates that notes generated via the ICL-based approach with GPT-4 are preferred about as often as human-written notes.
arXiv Detail & Related papers (2023-05-03T15:58:28Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Findings of the WMT 2022 Shared Task on Translation Suggestion [63.457874930232926]
We report the result of the first edition of the WMT shared task on Translation Suggestion.
The task aims to provide alternatives for specific words or phrases given the entire documents generated by machine translation (MT)
It consists two sub-tasks, namely, the naive translation suggestion and translation suggestion with hints.
arXiv Detail & Related papers (2022-11-30T03:48:36Z) - CREATIVESUMM: Shared Task on Automatic Summarization for Creative
Writing [90.58269243992318]
This paper introduces the shared task of summarizing documents in several creative domains, namely literary texts, movie scripts, and television scripts.
We introduce four sub-tasks and their corresponding datasets, focusing on summarizing books, movie scripts, primetime television scripts, and daytime soap opera scripts.
As part of the CREATIVESUMM workshop at COLING 2022, the shared task attracted 18 submissions in total.
arXiv Detail & Related papers (2022-11-10T21:31:03Z) - Instruction Tuning for Few-Shot Aspect-Based Sentiment Analysis [72.9124467710526]
generative approaches have been proposed to extract all four elements as (one or more) quadruplets from text as a single task.
We propose a unified framework for solving ABSA, and the associated sub-tasks to improve the performance in few-shot scenarios.
arXiv Detail & Related papers (2022-10-12T23:38:57Z) - A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks
and Datasets [70.32630628211803]
We propose two frameworks to support automatic medical consultation, namely doctor-patient dialogue understanding and task-oriented interaction.
A new large medical dialogue dataset with multi-level fine-grained annotations is introduced.
We report a set of benchmark results for each task, which shows the usability of the dataset and sets a baseline for future studies.
arXiv Detail & Related papers (2022-04-19T16:43:21Z) - Towards an Automated SOAP Note: Classifying Utterances from Medical
Conversations [0.6875312133832078]
We bridge the gap for classifying utterances from medical conversations according to (i) the SOAP section and (ii) the speaker role.
We present a systematic analysis in which we adapt an existing deep learning architecture to the two aforementioned tasks.
The results suggest that modelling context in a hierarchical manner, which captures both word and utterance level context, yields substantial improvements on both classification tasks.
arXiv Detail & Related papers (2020-07-17T04:19:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.