The Lay Person's Guide to Biomedicine: Orchestrating Large Language
Models
- URL: http://arxiv.org/abs/2402.13498v1
- Date: Wed, 21 Feb 2024 03:21:14 GMT
- Title: The Lay Person's Guide to Biomedicine: Orchestrating Large Language
Models
- Authors: Zheheng Luo, Qianqian Xie, Sophia Ananiadou
- Abstract summary: Large language models (LLMs) have demonstrated a remarkable capacity for text simplification, background information generation, and text evaluation.
We propose a novel textitExplain-then-Summarise LS framework, which leverages LLMs to generate high-quality background knowledge.
We also propose two novel LS evaluation metrics, which assess layness from multiple perspectives.
- Score: 38.8292168447796
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Automated lay summarisation (LS) aims to simplify complex technical documents
into a more accessible format to non-experts. Existing approaches using
pre-trained language models, possibly augmented with external background
knowledge, tend to struggle with effective simplification and explanation.
Moreover, automated methods that can effectively assess the `layness' of
generated summaries are lacking. Recently, large language models (LLMs) have
demonstrated a remarkable capacity for text simplification, background
information generation, and text evaluation. This has motivated our systematic
exploration into using LLMs to generate and evaluate lay summaries of
biomedical articles. We propose a novel \textit{Explain-then-Summarise} LS
framework, which leverages LLMs to generate high-quality background knowledge
to improve supervised LS. We also evaluate the performance of LLMs for
zero-shot LS and propose two novel LLM-based LS evaluation metrics, which
assess layness from multiple perspectives. Finally, we conduct a human
assessment of generated lay summaries. Our experiments reveal that
LLM-generated background information can support improved supervised LS.
Furthermore, our novel zero-shot LS evaluation metric demonstrates a high
degree of alignment with human preferences. We conclude that LLMs have an
important part to play in improving both the performance and evaluation of LS
methods.
Related papers
- Your Weak LLM is Secretly a Strong Teacher for Alignment [19.33906256866585]
Existing alignment frameworks present constraints either in the form of expensive human effort or high computational costs.
This paper explores a promising middle ground, where we employ a weak LLM that is significantly less resource-intensive than top-tier models.
We show that weak LLMs can provide feedback that rivals or even exceeds that of fully human-annotated data.
arXiv Detail & Related papers (2024-09-13T13:24:52Z) - Exploring the landscape of large language models: Foundations, techniques, and challenges [8.042562891309414]
The article sheds light on the mechanics of in-context learning and a spectrum of fine-tuning approaches.
It explores how LLMs can be more closely aligned with human preferences through innovative reinforcement learning frameworks.
The ethical dimensions of LLM deployment are discussed, underscoring the need for mindful and responsible application.
arXiv Detail & Related papers (2024-04-18T08:01:20Z) - Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG.
InFO-RAG is low-cost and general across various tasks.
It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - Evaluating Large Language Models at Evaluating Instruction Following [54.49567482594617]
We introduce a challenging meta-evaluation benchmark, LLMBar, designed to test the ability of an LLM evaluator in discerning instruction-following outputs.
We discover that different evaluators exhibit distinct performance on LLMBar and even the highest-scoring ones have substantial room for improvement.
arXiv Detail & Related papers (2023-10-11T16:38:11Z) - Summarization is (Almost) Dead [49.360752383801305]
We develop new datasets and conduct human evaluation experiments to evaluate the zero-shot generation capability of large language models (LLMs)
Our findings indicate a clear preference among human evaluators for LLM-generated summaries over human-written summaries and summaries generated by fine-tuned models.
arXiv Detail & Related papers (2023-09-18T08:13:01Z) - Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks.
Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information.
This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z) - On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets.
We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.