Related papers: WisPerMed at BioLaySumm: Adapting Autoregressive Large Language Models for Lay Summarization of Scientific Articles

WisPerMed at BioLaySumm: Adapting Autoregressive Large Language Models for Lay Summarization of Scientific Articles

URL: http://arxiv.org/abs/2405.11950v1
Date: Mon, 20 May 2024 10:54:47 GMT
Title: WisPerMed at BioLaySumm: Adapting Autoregressive Large Language Models for Lay Summarization of Scientific Articles
Authors: Tabea M. G. Pakull, Hendrik Damm, Ahmad Idrissi-Yaghir, Henning Schäfer, Peter A. Horn, Christoph M. Friedrich,
Abstract summary: This paper details the efforts of the WisPerMed team in the BioLaySumm2024 Shared Task on automatic lay summarization. Large language models (LLMs), specifically the BioMistral and Llama3 models, were fine-tuned and employed to create lay summaries. Experiments demonstrated that fine-tuning generally led to the best performance across most evaluated metrics.
Score: 0.41716369948557463
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper details the efforts of the WisPerMed team in the BioLaySumm2024 Shared Task on automatic lay summarization in the biomedical domain, aimed at making scientific publications accessible to non-specialists. Large language models (LLMs), specifically the BioMistral and Llama3 models, were fine-tuned and employed to create lay summaries from complex scientific texts. The summarization performance was enhanced through various approaches, including instruction tuning, few-shot learning, and prompt variations tailored to incorporate specific context information. The experiments demonstrated that fine-tuning generally led to the best performance across most evaluated metrics. Few-shot learning notably improved the models' ability to generate relevant and factually accurate texts, particularly when using a well-crafted prompt. Additionally, a Dynamic Expert Selection (DES) mechanism to optimize the selection of text outputs based on readability and factuality metrics was developed. Out of 54 participants, the WisPerMed team reached the 4th place, measured by readability, factuality, and relevance. Determined by the overall score, our approach improved upon the baseline by approx. 5.5 percentage points and was only approx 1.5 percentage points behind the first place.

Related papers

Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track [14.011972819032328]
We hosted the Plain Language Adaptation of Biomedical Abstracts track at the 2023 and 2024 Text Retrieval Conference.<n>Tasks included complete, sentence-level, rewriting of abstracts and identifying and replacing difficult terms.<n>Twelve teams spanning twelve countries participated in the track, with models from multilayer perceptrons to large pretrained transformers.
arXiv Detail & Related papers (2025-07-18T17:23:52Z)
Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content [0.10241134756773229]
We introduce Biomed-Enriched, a biomedical text dataset constructed from PubMed via a two-stage annotation process.<n>In the first stage, a large language model annotates 400K paragraphs from PubMed scientific articles, assigning scores for their type (review, study, clinical case, other), domain (clinical, biomedical, other), and educational quality.<n>The resulting metadata allows us to extract refined subsets, including 2M clinical case paragraphs with over 450K high-quality ones from articles with commercial-use licenses.
arXiv Detail & Related papers (2025-06-25T11:30:25Z)
An Empirical Comparison of Text Summarization: A Multi-Dimensional Evaluation of Large Language Models [2.1945750784330067]
This research evaluates summarization performance across 17 large language models (OpenAI, Google, Anthropic, open-source) We assessed models on seven diverse datasets using metrics for factual consistency, semantic similarity, lexical overlap, and human-like quality.
arXiv Detail & Related papers (2025-04-06T16:24:22Z)
Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored. We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches. We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z)
Pre-training data selection for biomedical domain adaptation using journal impact metrics [0.0]
We employ two straightforward journal impact metrics and conduct experiments by continually pre-training BERT on various subsets of the complete PubMed training set. Our results show that pruning using journal impact metrics is not efficient. But we also show that pre-training using fewer abstracts (but with the same number of training steps) does not necessarily decrease the resulting model's performance.
arXiv Detail & Related papers (2024-09-04T13:59:48Z)
Towards Evaluating and Building Versatile Large Language Models for Medicine [57.49547766838095]
We present MedS-Bench, a benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts. MedS-Bench spans 11 high-level clinical tasks, including clinical report summarization, treatment recommendations, diagnosis, named entity recognition, and medical concept explanation. MedS-Ins comprises 58 medically oriented language corpora, totaling 13.5 million samples across 122 tasks.
arXiv Detail & Related papers (2024-08-22T17:01:34Z)
Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback. Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z)
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks. SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z)
WisPerMed at "Discharge Me!": Advancing Text Generation in Healthcare with Large Language Models, Dynamic Expert Selection, and Priming Techniques on MIMIC-IV [0.38084074204911494]
This study aims to leverage state of the art language models to automate generating the "Brief Hospital Course" and "Discharge Instructions" sections of Discharge Summaries. We investigate how automation can improve documentation accuracy, alleviate clinician burnout, and enhance operational efficacy in healthcare facilities.
arXiv Detail & Related papers (2024-05-18T10:56:45Z)
Information-Theoretic Distillation for Reference-less Summarization [67.51150817011617]
We present a novel framework to distill a powerful summarizer based on the information-theoretic objective for summarization. We start off from Pythia-2.8B as the teacher model, which is not yet capable of summarization. We arrive at a compact but powerful summarizer with only 568M parameters that performs competitively against ChatGPT.
arXiv Detail & Related papers (2024-03-20T17:42:08Z)
Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models. We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT. We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z)
Large Language Models and Prompt Engineering for Biomedical Query Focused Multi-Document Summarisation [0.565658124285176]
This paper reports on the use of prompt engineering and GPT-3.5 for biomedical query-focused multi-document summarisation. Using GPT-3.5 and appropriate prompts, our system top ROUGE-F1 results in the task of obtaining short-paragraph-sized answers to biomedical questions.
arXiv Detail & Related papers (2023-11-09T06:45:04Z)
Neural Summarization of Electronic Health Records [8.784162652042957]
We studied the viability of the automatic generation of various sections of a discharge summary using four state-of-the-art neural network summarization models. Fine-tuning language models that were previously instruction fine-tuned showed better performance in summarizing entire reports.
arXiv Detail & Related papers (2023-05-24T15:05:53Z)
A systematic evaluation of large language models for biomedical natural language processing: benchmarks, baselines, and recommendations [22.668383945059762]
We present a systematic evaluation of four representative Large Language Models (LLMs) across 12 BioNLP datasets. The evaluation is conducted under four settings: zero-shot, static few-shot, dynamic K-nearest few-shot, and fine-tuning. We compare these models against state-of-the-art (SOTA) approaches that fine-tune (domain-specific) BERT or BART models.
arXiv Detail & Related papers (2023-05-10T13:40:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.