Related papers: ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews

ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews

URL: http://arxiv.org/abs/2306.12587v1
Date: Wed, 21 Jun 2023 22:00:03 GMT
Title: ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews
Authors: Mike D'Arcy, Alexis Ross, Erin Bransom, Bailey Kuehl, Jonathan Bragg, Tom Hope, Doug Downey
Abstract summary: We introduce this task for large language models and release ARIES, a dataset of review comments and their corresponding paper edits. We find that models struggle even to identify the edits that correspond to a comment. GPT-4 often succeeds in addressing comments on a surface level, but it rigidly follows the wording of the feedback rather than the underlying intent.
Score: 19.68152108760845
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Revising scientific papers based on peer feedback is a challenging task that requires not only deep scientific knowledge and reasoning, but also the ability to recognize the implicit requests in high-level feedback and to choose the best of many possible ways to update the manuscript in response. We introduce this task for large language models and release ARIES, a dataset of review comments and their corresponding paper edits, to enable training and evaluating models. We study two versions of the task: comment-edit alignment and edit generation, and evaluate several baselines, including GPT-4. We find that models struggle even to identify the edits that correspond to a comment, especially in cases where the comment is phrased in an indirect way or where the edit addresses the spirit of a comment but not the precise request. When tasked with generating edits, GPT-4 often succeeds in addressing comments on a surface level, but it rigidly follows the wording of the feedback rather than the underlying intent, and includes fewer technical details than human-written edits. We hope that our formalization, dataset, and analysis will form a foundation for future work in this area.

Related papers

Towards a Principled Evaluation of Knowledge Editors [2.497666465251894]
We show that choosing different metrics and evaluation methodologies as well as different edit batch sizes can lead to a different ranking of knowledge editors.<n>We also include a manual assessment of the string matching based evaluation method for knowledge editing that is favored by recently released datasets, revealing a tendency to produce false positive matches.
arXiv Detail & Related papers (2025-07-08T12:37:54Z)
The Mirage of Model Editing: Revisiting Evaluation in the Wild [70.17413507444704]
We study the effectiveness of model editing in question answering applications. Our single editing experiments indicate that current editing methods perform substantially worse than previously reported. Our analysis provides a fundamental reexamination of both the real-world applicability of existing model editing methods and their evaluation practices.
arXiv Detail & Related papers (2025-02-16T15:57:55Z)
SEAGraph: Unveiling the Whole Story of Paper Review Comments [26.39115060771725]
In the traditional peer review process, authors often receive vague or insufficiently detailed feedback. This raises the critical question of how to enhance authors' comprehension of review comments. We present SEAGraph, a novel framework developed to clarify review comments by uncovering the underlying intentions.
arXiv Detail & Related papers (2024-12-16T16:24:36Z)
Re3: A Holistic Framework and Dataset for Modeling Collaborative Document Revision [62.12545440385489]
We introduce Re3, a framework for joint analysis of collaborative document revision. We present Re3-Sci, a large corpus of aligned scientific paper revisions manually labeled according to their action and intent. We use the new data to provide first empirical insights into collaborative document revision in the academic domain.
arXiv Detail & Related papers (2024-05-31T21:19:09Z)
Automated Focused Feedback Generation for Scientific Writing Assistance [6.559560602099439]
SWIF$2$T: a Scientific WrIting Focused Feedback Tool. It is designed to generate specific, actionable and coherent comments, which identify weaknesses in a scientific paper and/or propose revisions to it. We compile a dataset of 300 peer reviews citing weaknesses in scientific papers and conduct human evaluation. The results demonstrate the superiority in specificity, reading comprehension, and overall helpfulness of SWIF$2$T's feedback compared to other approaches.
arXiv Detail & Related papers (2024-05-30T20:56:41Z)
CASIMIR: A Corpus of Scientific Articles enhanced with Multiple Author-Integrated Revisions [7.503795054002406]
We propose an original textual resource on the revision step of the writing process of scientific articles. This new dataset, called CASIMIR, contains the multiple revised versions of 15,646 scientific articles from OpenReview, along with their peer reviews.
arXiv Detail & Related papers (2024-03-01T03:07:32Z)
A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [58.6354685593418]
This paper proposes several article-level, field-normalized, and large language model-empowered bibliometric indicators to evaluate reviews. The newly emerging AI-generated literature reviews are also appraised. This work offers insights into the current challenges of literature reviews and envisions future directions for their development.
arXiv Detail & Related papers (2024-02-20T11:28:50Z)
When Reviewers Lock Horn: Finding Disagreement in Scientific Peer Reviews [24.875901048855077]
We introduce a novel task of automatically identifying contradictions among reviewers on a given article. To the best of our knowledge, we make the first attempt to identify disagreements among peer reviewers automatically.
arXiv Detail & Related papers (2023-10-28T11:57:51Z)
Beyond the Chat: Executable and Verifiable Text-Editing with LLMs [87.84199761550634]
Conversational interfaces powered by Large Language Models (LLMs) have recently become a popular way to obtain feedback during document editing. We present InkSync, an editing interface that suggests executable edits directly within the document being edited.
arXiv Detail & Related papers (2023-09-27T00:56:17Z)
DocTER: Evaluating Document-based Knowledge Editing [53.14000724633775]
We explore knowledge editing using easily accessible documents instead of manually labeled factual triples.<n>A comprehensive four-perspective evaluation is introduced: Edit Success, Locality, Reasoning, and Cross-lingual Transfer.<n>Experiments on popular knowledge editing methods demonstrate that editing with documents presents significantly greater challenges than using triples.
arXiv Detail & Related papers (2023-08-19T09:17:19Z)
Scientific Opinion Summarization: Paper Meta-review Generation Dataset, Methods, and Evaluation [55.00687185394986]
We propose the task of scientific opinion summarization, where research paper reviews are synthesized into meta-reviews. We introduce the ORSUM dataset covering 15,062 paper meta-reviews and 57,536 paper reviews from 47 conferences. Our experiments show that (1) human-written summaries do not always satisfy all necessary criteria such as depth of discussion, and identifying consensus and controversy for the specific domain, and (2) the combination of task decomposition and iterative self-refinement shows strong potential for enhancing the opinions.
arXiv Detail & Related papers (2023-05-24T02:33:35Z)
Understanding Iterative Revision from Human-Written Text [10.714872525208385]
IteraTeR is the first large-scale, multi-domain, edit-intention annotated corpus of iteratively revised text. We better understand the text revision process, making vital connections between edit intentions and writing quality.
arXiv Detail & Related papers (2022-03-08T01:47:42Z)
Can We Automate Scientific Reviewing? [89.50052670307434]
We discuss the possibility of using state-of-the-art natural language processing (NLP) models to generate first-pass peer reviews for scientific papers. We collect a dataset of papers in the machine learning domain, annotate them with different aspects of content covered in each review, and train targeted summarization models that take in papers to generate reviews. Comprehensive experimental results show that system-generated reviews tend to touch upon more aspects of the paper than human-written reviews.
arXiv Detail & Related papers (2021-01-30T07:16:53Z)
Aspect-based Sentiment Analysis of Scientific Reviews [12.472629584751509]
We show that the distribution of aspect-based sentiments obtained from a review is significantly different for accepted and rejected papers. As a second objective, we quantify the extent of disagreement among the reviewers refereeing a paper. We also investigate the extent of disagreement between the reviewers and the chair and find that the inter-reviewer disagreement may have a link to the disagreement with the chair.
arXiv Detail & Related papers (2020-06-05T07:06:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.