Prompting LLMs to Compose Meta-Review Drafts from Peer-Review Narratives
of Scholarly Manuscripts
- URL: http://arxiv.org/abs/2402.15589v1
- Date: Fri, 23 Feb 2024 20:14:16 GMT
- Title: Prompting LLMs to Compose Meta-Review Drafts from Peer-Review Narratives
of Scholarly Manuscripts
- Authors: Shubhra Kanti Karmaker Santu, Sanjeev Kumar Sinha, Naman Bansal, Alex
Knipper, Souvika Sarkar, John Salvador, Yash Mahajan, Sri Guttikonda, Mousumi
Akter, Matthew Freestone, Matthew C. Williams Jr
- Abstract summary: Large Language Models (LLMs) can generate meta-reviews based on peer-review narratives from multiple experts.
In this paper, we perform a case study with three popular LLMs to automatically generate meta-reviews.
- Score: 6.2701471990853594
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the most important yet onerous tasks in the academic peer-reviewing
process is composing meta-reviews, which involves understanding the core
contributions, strengths, and weaknesses of a scholarly manuscript based on
peer-review narratives from multiple experts and then summarizing those
multiple experts' perspectives into a concise holistic overview. Given the
latest major developments in generative AI, especially Large Language Models
(LLMs), it is very compelling to rigorously study the utility of LLMs in
generating such meta-reviews in an academic peer-review setting. In this paper,
we perform a case study with three popular LLMs, i.e., GPT-3.5, LLaMA2, and
PaLM2, to automatically generate meta-reviews by prompting them with different
types/levels of prompts based on the recently proposed TELeR taxonomy. Finally,
we perform a detailed qualitative study of the meta-reviews generated by the
LLMs and summarize our findings and recommendations for prompting LLMs for this
complex task.
Related papers
- From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items [0.18416014644193068]
We consider LLMs as authors of commonsense assessment items.
We prompt LLMs to generate items in the style of a prominent benchmark for commonsense reasoning.
We find that LLMs that succeed in answering the original COPA benchmark are also more successful in authoring their own items.
arXiv Detail & Related papers (2024-10-18T22:42:23Z) - A Survey on Benchmarks of Multimodal Large Language Models [65.87641718350639]
This paper presents a comprehensive review of 200 benchmarks and evaluations for Multimodal Large Language Models (MLLMs)
We focus on (1)perception and understanding, (2)cognition and reasoning, (3)specific domains, (4)key capabilities, and (5)other modalities.
Our key argument is that evaluation should be regarded as a crucial discipline to support the development of MLLMs better.
arXiv Detail & Related papers (2024-08-16T09:52:02Z) - LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - Benchmarking LLMs on the Semantic Overlap Summarization Task [9.656095701778975]
This paper comprehensively evaluates Large Language Models (LLMs) on the Semantic Overlap Summarization (SOS) task.
We report well-established metrics like ROUGE, BERTscore, and SEM-F1$ on two different datasets of alternative narratives.
arXiv Detail & Related papers (2024-02-26T20:33:50Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - PRE: A Peer Review Based Large Language Model Evaluator [14.585292530642603]
Existing paradigms rely on either human annotators or model-based evaluators to evaluate the performance of LLMs.
We propose a novel framework that can automatically evaluate LLMs through a peer-review process.
arXiv Detail & Related papers (2024-01-28T12:33:14Z) - A Comprehensive Overview of Large Language Models [68.22178313875618]
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks.
This article provides an overview of the existing literature on a broad range of LLM-related concepts.
arXiv Detail & Related papers (2023-07-12T20:01:52Z) - MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [73.86954509967416]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks.
This paper presents the first comprehensive MLLM Evaluation benchmark MME.
It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z) - Introspective Tips: Large Language Model for In-Context Decision Making [48.96711664648164]
We employ Introspective Tips" to facilitate large language models (LLMs) in self-optimizing their decision-making.
Our method enhances the agent's performance in both few-shot and zero-shot learning situations.
Experiments involving over 100 games in TextWorld illustrate the superior performance of our approach.
arXiv Detail & Related papers (2023-05-19T11:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.