Related papers: GersteinLab at MEDIQA-Chat 2023: Clinical Note Summarization from Doctor-Patient Conversations through Fine-tuning and In-context Learning

GersteinLab at MEDIQA-Chat 2023: Clinical Note Summarization from Doctor-Patient Conversations through Fine-tuning and In-context Learning

URL: http://arxiv.org/abs/2305.05001v1
Date: Mon, 8 May 2023 19:16:26 GMT
Title: GersteinLab at MEDIQA-Chat 2023: Clinical Note Summarization from Doctor-Patient Conversations through Fine-tuning and In-context Learning
Authors: Xiangru Tang, Andrew Tran, Jeffrey Tan, Mark Gerstein
Abstract summary: This paper presents our contribution to the MEDIQA-2023 Dialogue2Note shared task, encompassing both subtask A and subtask B. We approach the task as a dialogue summarization problem and implement two distinct pipelines: (a) a fine-tuning of a pre-trained dialogue summarization model and GPT-3, and (b) few-shot in-context learning (ICL) using a large language model, GPT-4. Both methods achieve excellent results in terms of ROUGE-1 F1, BERTScore F1 (deberta-xlarge-mnli), and BLEURT
Score: 4.2570830892708225
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents our contribution to the MEDIQA-2023 Dialogue2Note shared task, encompassing both subtask A and subtask B. We approach the task as a dialogue summarization problem and implement two distinct pipelines: (a) a fine-tuning of a pre-trained dialogue summarization model and GPT-3, and (b) few-shot in-context learning (ICL) using a large language model, GPT-4. Both methods achieve excellent results in terms of ROUGE-1 F1, BERTScore F1 (deberta-xlarge-mnli), and BLEURT, with scores of 0.4011, 0.7058, and 0.5421, respectively. Additionally, we predict the associated section headers using RoBERTa and SciBERT based classification models. Our team ranked fourth among all teams, while each team is allowed to submit three runs as part of their submission. We also utilize expert annotations to demonstrate that the notes generated through the ICL GPT-4 are better than all other baselines. The code for our submission is available.

Related papers

SummQA at MEDIQA-Chat 2023:In-Context Learning with GPT-4 for Medical Summarization [5.92318236682798]
We present a novel system for the Dialogue2Note Medical Summarization tasks in the MEDIQA 2023 Shared Task. Our approach for section-wise summarization (Task A) is a two-stage process of selecting semantically similar dialogues and using the top-k similar dialogues as in-context examples for GPT-4. For full-note summarization (Task B), we use a similar solution with k=1.
arXiv Detail & Related papers (2023-06-30T03:14:04Z)
DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System for Multilingual Named Entity Recognition [94.90258603217008]
The MultiCoNER RNum2 shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios. Previous top systems in the MultiCoNER RNum1 either incorporate the knowledge bases or gazetteers. We propose a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER.
arXiv Detail & Related papers (2023-05-05T16:59:26Z)
WangLab at MEDIQA-Chat 2023: Clinical Note Generation from Doctor-Patient Conversations using Large Language Models [2.3608256778747565]
We submit to the MEDIQA-Chat 2023 shared task for automatic clinical note generation from doctor-patient conversations. We report results for two approaches: the first fine-tunes a pre-trained language model (PLM) on the shared task data, and the second uses few-shot in-context learning (ICL) with a large language model (LLM) Expert human scrutiny indicates that notes generated via the ICL-based approach with GPT-4 are preferred about as often as human-written notes.
arXiv Detail & Related papers (2023-05-03T15:58:28Z)
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization [101.37439352091612]
We describe the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. We present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT.
arXiv Detail & Related papers (2022-12-22T19:56:09Z)
CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation [75.60156479374416]
CGoDial is a new challenging and comprehensive Chinese benchmark for Goal-oriented Dialog evaluation. It contains 96,763 dialog sessions and 574,949 dialog turns totally, covering three datasets with different knowledge sources. To bridge the gap between academic benchmarks and spoken dialog scenarios, we either collect data from real conversations or add spoken features to existing datasets via crowd-sourcing.
arXiv Detail & Related papers (2022-11-21T16:21:41Z)
Instruction Tuning for Few-Shot Aspect-Based Sentiment Analysis [72.9124467710526]
generative approaches have been proposed to extract all four elements as (one or more) quadruplets from text as a single task. We propose a unified framework for solving ABSA, and the associated sub-tasks to improve the performance in few-shot scenarios.
arXiv Detail & Related papers (2022-10-12T23:38:57Z)
SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding [68.94808536012371]
We propose a tree-structured pre-trained conversation model, which learns dialog representations from limited labeled dialogs and large-scale unlabeled dialog corpora. Our method can achieve new state-of-the-art results on the DialoGLUE benchmark consisting of seven datasets and four popular dialog understanding tasks.
arXiv Detail & Related papers (2022-09-14T13:42:50Z)
Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups. We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective. Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z)
UIUC_BioNLP at SemEval-2021 Task 11: A Cascade of Neural Models for Structuring Scholarly NLP Contributions [1.5942130010323128]
We propose a cascade of neural models that performs sentence classification, phrase recognition, and triple extraction. A BERT-CRF model was used to recognize and characterize relevant phrases in contribution sentences. Our system was officially ranked second in Phase 1 evaluation and first in both parts of Phase 2 evaluation.
arXiv Detail & Related papers (2021-05-12T05:24:35Z)
KnowGraph@IITK at SemEval-2021 Task 11: Building KnowledgeGraph for NLP Research [2.1012672709024294]
We develop a system for a research paper contributions-focused knowledge graph over Natural Language Processing literature. The proposed system is agnostic to the subject domain and can be applied for building a knowledge graph for any area. Our system achieved F1 score of 0.38, 0.63 and 0.76 in end-to-end pipeline testing, phrase extraction testing and triplet extraction testing respectively.
arXiv Detail & Related papers (2021-04-04T14:33:21Z)
BUT-FIT at SemEval-2020 Task 4: Multilingual commonsense [1.433758865948252]
This paper describes work of the BUT-FIT's team at SemEval 2020 Task 4 - Commonsense Validation and Explanation. In subtasks A and B, our submissions are based on pretrained language representation models (namely ALBERT) and data augmentation. We experimented with solving the task for another language, Czech, by means of multilingual models and machine translated dataset. We show that with a strong machine translation system, our system can be used in another language with a small accuracy loss.
arXiv Detail & Related papers (2020-08-17T12:45:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.