Towards Automated Document Revision: Grammatical Error Correction,
Fluency Edits, and Beyond
- URL: http://arxiv.org/abs/2205.11484v1
- Date: Mon, 23 May 2022 17:37:20 GMT
- Title: Towards Automated Document Revision: Grammatical Error Correction,
Fluency Edits, and Beyond
- Authors: Masato Mita, Keisuke Sakaguchi, Masato Hagiwara, Tomoya Mizumoto, Jun
Suzuki, Kentaro Inui
- Abstract summary: We introduce a new document-revision corpus, TETRA, where professional editors revised academic papers sampled from the ACL anthology.
We show the uniqueness of TETRA compared with existing document revision corpora and demonstrate that a fine-tuned pre-trained language model can discriminate the quality of documents after revision even when the difference is subtle.
- Score: 46.130399041820716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language processing technology has rapidly improved automated
grammatical error correction tasks, and the community begins to explore
document-level revision as one of the next challenges. To go beyond
sentence-level automated grammatical error correction to NLP-based
document-level revision assistant, there are two major obstacles: (1) there are
few public corpora with document-level revisions being annotated by
professional editors, and (2) it is not feasible to elicit all possible
references and evaluate the quality of revision with such references because
there are infinite possibilities of revision. This paper tackles these
challenges. First, we introduce a new document-revision corpus, TETRA, where
professional editors revised academic papers sampled from the ACL anthology
which contain few trivial grammatical errors that enable us to focus more on
document- and paragraph-level edits such as coherence and consistency. Second,
we explore reference-less and interpretable methods for meta-evaluation that
can detect quality improvements by document revision. We show the uniqueness of
TETRA compared with existing document revision corpora and demonstrate that a
fine-tuned pre-trained language model can discriminate the quality of documents
after revision even when the difference is subtle. This promising result will
encourage the community to further explore automated document revision models
and metrics in future.
Related papers
- Re3: A Holistic Framework and Dataset for Modeling Collaborative Document Revision [62.12545440385489]
We introduce Re3, a framework for joint analysis of collaborative document revision.
We present Re3-Sci, a large corpus of aligned scientific paper revisions manually labeled according to their action and intent.
We use the new data to provide first empirical insights into collaborative document revision in the academic domain.
arXiv Detail & Related papers (2024-05-31T21:19:09Z) - CASIMIR: A Corpus of Scientific Articles enhanced with Multiple Author-Integrated Revisions [7.503795054002406]
We propose an original textual resource on the revision step of the writing process of scientific articles.
This new dataset, called CASIMIR, contains the multiple revised versions of 15,646 scientific articles from OpenReview, along with their peer reviews.
arXiv Detail & Related papers (2024-03-01T03:07:32Z) - Beyond the Chat: Executable and Verifiable Text-Editing with LLMs [87.84199761550634]
Conversational interfaces powered by Large Language Models (LLMs) have recently become a popular way to obtain feedback during document editing.
We present InkSync, an editing interface that suggests executable edits directly within the document being edited.
arXiv Detail & Related papers (2023-09-27T00:56:17Z) - Improving Iterative Text Revision by Learning Where to Edit from Other
Revision Tasks [11.495407637511878]
Iterative text revision improves text quality by fixing grammatical errors, rephrasing for better readability or contextual appropriateness, or reorganizing sentence structures throughout a document.
Most recent research has focused on understanding and classifying different types of edits in the iterative revision process from human-written text.
We aim to build an end-to-end text revision system that can iteratively generate helpful edits by explicitly detecting editable spans with their corresponding edit intents.
arXiv Detail & Related papers (2022-12-02T18:10:43Z) - EditEval: An Instruction-Based Benchmark for Text Improvements [73.5918084416016]
This work presents EditEval: An instruction-based, benchmark and evaluation suite for automatic evaluation of editing capabilities.
We evaluate several pre-trained models, which shows that InstructGPT and PEER perform the best, but that most baselines fall below the supervised SOTA.
Our analysis shows that commonly used metrics for editing tasks do not always correlate well, and that optimization for prompts with the highest performance does not necessarily entail the strongest robustness to different models.
arXiv Detail & Related papers (2022-09-27T12:26:05Z) - Rethink about the Word-level Quality Estimation for Machine Translation
from Human Judgement [57.72846454929923]
We create a benchmark dataset, emphHJQE, where the expert translators directly annotate poorly translated words.
We propose two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to emphHJQE.
The results show our proposed dataset is more consistent with human judgement and also confirm the effectiveness of the proposed tag correcting strategies.
arXiv Detail & Related papers (2022-09-13T02:37:12Z) - Read, Revise, Repeat: A System Demonstration for Human-in-the-loop
Iterative Text Revision [11.495407637511878]
We present a human-in-the-loop iterative text revision system, Read, Revise, Repeat (R3)
R3 aims at achieving high quality text revisions with minimal human efforts by reading model-generated revisions and user feedbacks, revising documents, and repeating human-machine interactions.
arXiv Detail & Related papers (2022-04-07T18:33:10Z) - Understanding Iterative Revision from Human-Written Text [10.714872525208385]
IteraTeR is the first large-scale, multi-domain, edit-intention annotated corpus of iteratively revised text.
We better understand the text revision process, making vital connections between edit intentions and writing quality.
arXiv Detail & Related papers (2022-03-08T01:47:42Z) - Automatic Document Sketching: Generating Drafts from Analogous Texts [44.626645471195495]
We introduce a new task, document sketching, which involves generating entire draft documents for the writer to review and revise.
These drafts are built from sets of documents that overlap in form - sharing large segments of potentially reusable text - while diverging in content.
We investigate the application of weakly supervised methods, including use of a transformer-based mixture of experts, together with reinforcement learning.
arXiv Detail & Related papers (2021-06-14T06:46:06Z) - Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level.
We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.