Prompting LLMs to Compose Meta-Review Drafts from Peer-Review Narratives
of Scholarly Manuscripts
- URL: http://arxiv.org/abs/2402.15589v1
- Date: Fri, 23 Feb 2024 20:14:16 GMT
- Title: Prompting LLMs to Compose Meta-Review Drafts from Peer-Review Narratives
of Scholarly Manuscripts
- Authors: Shubhra Kanti Karmaker Santu, Sanjeev Kumar Sinha, Naman Bansal, Alex
Knipper, Souvika Sarkar, John Salvador, Yash Mahajan, Sri Guttikonda, Mousumi
Akter, Matthew Freestone, Matthew C. Williams Jr
- Abstract summary: Large Language Models (LLMs) can generate meta-reviews based on peer-review narratives from multiple experts.
In this paper, we perform a case study with three popular LLMs to automatically generate meta-reviews.
- Score: 6.2701471990853594
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the most important yet onerous tasks in the academic peer-reviewing
process is composing meta-reviews, which involves understanding the core
contributions, strengths, and weaknesses of a scholarly manuscript based on
peer-review narratives from multiple experts and then summarizing those
multiple experts' perspectives into a concise holistic overview. Given the
latest major developments in generative AI, especially Large Language Models
(LLMs), it is very compelling to rigorously study the utility of LLMs in
generating such meta-reviews in an academic peer-review setting. In this paper,
we perform a case study with three popular LLMs, i.e., GPT-3.5, LLaMA2, and
PaLM2, to automatically generate meta-reviews by prompting them with different
types/levels of prompts based on the recently proposed TELeR taxonomy. Finally,
we perform a detailed qualitative study of the meta-reviews generated by the
LLMs and summarize our findings and recommendations for prompting LLMs for this
complex task.
Related papers
- From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items [0.18416014644193068]
We consider LLMs as authors of commonsense assessment items.
We prompt LLMs to generate items in the style of a prominent benchmark for commonsense reasoning.
We find that LLMs that succeed in answering the original COPA benchmark are also more successful in authoring their own items.
arXiv Detail & Related papers (2024-10-18T22:42:23Z) - LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions [62.0123588983514]
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields.
We reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers.
We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources.
arXiv Detail & Related papers (2024-06-09T08:24:17Z) - A Sentiment Consolidation Framework for Meta-Review Generation [40.879419691373826]
We focus on meta-review generation, a form of sentiment summarisation for the scientific domain.
We propose novel prompting methods for Large Language Models to generate meta-reviews.
arXiv Detail & Related papers (2024-02-28T02:40:09Z) - Benchmarking LLMs on the Semantic Overlap Summarization Task [9.656095701778975]
This paper comprehensively evaluates Large Language Models (LLMs) on the Semantic Overlap Summarization (SOS) task.
We report well-established metrics like ROUGE, BERTscore, and SEM-F1$ on two different datasets of alternative narratives.
arXiv Detail & Related papers (2024-02-26T20:33:50Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - PRE: A Peer Review Based Large Language Model Evaluator [14.585292530642603]
Existing paradigms rely on either human annotators or model-based evaluators to evaluate the performance of LLMs.
We propose a novel framework that can automatically evaluate LLMs through a peer-review process.
arXiv Detail & Related papers (2024-01-28T12:33:14Z) - Evaluating Large Language Models at Evaluating Instruction Following [54.49567482594617]
We introduce a challenging meta-evaluation benchmark, LLMBar, designed to test the ability of an LLM evaluator in discerning instruction-following outputs.
We discover that different evaluators exhibit distinct performance on LLMBar and even the highest-scoring ones have substantial room for improvement.
arXiv Detail & Related papers (2023-10-11T16:38:11Z) - A Comprehensive Overview of Large Language Models [68.22178313875618]
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks.
This article provides an overview of the existing literature on a broad range of LLM-related concepts.
arXiv Detail & Related papers (2023-07-12T20:01:52Z) - On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets.
We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.