ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer
Reviews
- URL: http://arxiv.org/abs/2306.12587v1
- Date: Wed, 21 Jun 2023 22:00:03 GMT
- Title: ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer
Reviews
- Authors: Mike D'Arcy, Alexis Ross, Erin Bransom, Bailey Kuehl, Jonathan Bragg,
Tom Hope, Doug Downey
- Abstract summary: We introduce this task for large language models and release ARIES, a dataset of review comments and their corresponding paper edits.
We find that models struggle even to identify the edits that correspond to a comment.
GPT-4 often succeeds in addressing comments on a surface level, but it rigidly follows the wording of the feedback rather than the underlying intent.
- Score: 19.68152108760845
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Revising scientific papers based on peer feedback is a challenging task that
requires not only deep scientific knowledge and reasoning, but also the ability
to recognize the implicit requests in high-level feedback and to choose the
best of many possible ways to update the manuscript in response. We introduce
this task for large language models and release ARIES, a dataset of review
comments and their corresponding paper edits, to enable training and evaluating
models. We study two versions of the task: comment-edit alignment and edit
generation, and evaluate several baselines, including GPT-4. We find that
models struggle even to identify the edits that correspond to a comment,
especially in cases where the comment is phrased in an indirect way or where
the edit addresses the spirit of a comment but not the precise request. When
tasked with generating edits, GPT-4 often succeeds in addressing comments on a
surface level, but it rigidly follows the wording of the feedback rather than
the underlying intent, and includes fewer technical details than human-written
edits. We hope that our formalization, dataset, and analysis will form a
foundation for future work in this area.
Related papers
- Re3: A Holistic Framework and Dataset for Modeling Collaborative Document Revision [62.12545440385489]
We introduce Re3, a framework for joint analysis of collaborative document revision.
We present Re3-Sci, a large corpus of aligned scientific paper revisions manually labeled according to their action and intent.
We use the new data to provide first empirical insights into collaborative document revision in the academic domain.
arXiv Detail & Related papers (2024-05-31T21:19:09Z) - Automated Focused Feedback Generation for Scientific Writing Assistance [6.559560602099439]
SWIF$2$T: a Scientific WrIting Focused Feedback Tool.
It is designed to generate specific, actionable and coherent comments, which identify weaknesses in a scientific paper and/or propose revisions to it.
We compile a dataset of 300 peer reviews citing weaknesses in scientific papers and conduct human evaluation.
The results demonstrate the superiority in specificity, reading comprehension, and overall helpfulness of SWIF$2$T's feedback compared to other approaches.
arXiv Detail & Related papers (2024-05-30T20:56:41Z) - CASIMIR: A Corpus of Scientific Articles enhanced with Multiple Author-Integrated Revisions [7.503795054002406]
We propose an original textual resource on the revision step of the writing process of scientific articles.
This new dataset, called CASIMIR, contains the multiple revised versions of 15,646 scientific articles from OpenReview, along with their peer reviews.
arXiv Detail & Related papers (2024-03-01T03:07:32Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [58.6354685593418]
This paper proposes several article-level, field-normalized, and large language model-empowered bibliometric indicators to evaluate reviews.
The newly emerging AI-generated literature reviews are also appraised.
This work offers insights into the current challenges of literature reviews and envisions future directions for their development.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - To Revise or Not to Revise: Learning to Detect Improvable Claims for
Argumentative Writing Support [20.905660642919052]
We explore the main challenges to identifying argumentative claims in need of specific revisions.
We propose a new sampling strategy based on revision distance.
We provide evidence that using contextual information and domain knowledge can further improve prediction results.
arXiv Detail & Related papers (2023-05-26T10:19:54Z) - Scientific Opinion Summarization: Paper Meta-review Generation Dataset, Methods, and Evaluation [55.00687185394986]
We propose the task of scientific opinion summarization, where research paper reviews are synthesized into meta-reviews.
We introduce the ORSUM dataset covering 15,062 paper meta-reviews and 57,536 paper reviews from 47 conferences.
Our experiments show that (1) human-written summaries do not always satisfy all necessary criteria such as depth of discussion, and identifying consensus and controversy for the specific domain, and (2) the combination of task decomposition and iterative self-refinement shows strong potential for enhancing the opinions.
arXiv Detail & Related papers (2023-05-24T02:33:35Z) - EditEval: An Instruction-Based Benchmark for Text Improvements [73.5918084416016]
This work presents EditEval: An instruction-based, benchmark and evaluation suite for automatic evaluation of editing capabilities.
We evaluate several pre-trained models, which shows that InstructGPT and PEER perform the best, but that most baselines fall below the supervised SOTA.
Our analysis shows that commonly used metrics for editing tasks do not always correlate well, and that optimization for prompts with the highest performance does not necessarily entail the strongest robustness to different models.
arXiv Detail & Related papers (2022-09-27T12:26:05Z) - PEER: A Collaborative Language Model [70.11876901409906]
We introduce PEER, a collaborative language model that imitates the entire writing process itself.
PEER can write drafts, add suggestions, propose edits and provide explanations for its actions.
We show that PEER achieves strong performance across various domains and editing tasks.
arXiv Detail & Related papers (2022-08-24T16:56:47Z) - Memory-Based Model Editing at Scale [102.28475739907498]
Existing model editors struggle to accurately model an edit's intended scope.
We propose Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC)
SERAC stores edits in an explicit memory and learns to reason over them to modulate the base model's predictions as needed.
arXiv Detail & Related papers (2022-06-13T23:40:34Z) - Read, Revise, Repeat: A System Demonstration for Human-in-the-loop
Iterative Text Revision [11.495407637511878]
We present a human-in-the-loop iterative text revision system, Read, Revise, Repeat (R3)
R3 aims at achieving high quality text revisions with minimal human efforts by reading model-generated revisions and user feedbacks, revising documents, and repeating human-machine interactions.
arXiv Detail & Related papers (2022-04-07T18:33:10Z) - Understanding Iterative Revision from Human-Written Text [10.714872525208385]
IteraTeR is the first large-scale, multi-domain, edit-intention annotated corpus of iteratively revised text.
We better understand the text revision process, making vital connections between edit intentions and writing quality.
arXiv Detail & Related papers (2022-03-08T01:47:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.