WikiIns: A High-Quality Dataset for Controlled Text Editing by Natural
Language Instruction
- URL: http://arxiv.org/abs/2310.05009v1
- Date: Sun, 8 Oct 2023 04:46:39 GMT
- Title: WikiIns: A High-Quality Dataset for Controlled Text Editing by Natural
Language Instruction
- Authors: Xiang Chen, Zheng Li, Xiaojun Wan
- Abstract summary: We build and release WikiIns, a high-quality controlled text editing dataset with improved informativeness.
With the high-quality annotated dataset, we propose automatic approaches to generate a large-scale silver'' training set.
- Score: 56.196512595940334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text editing, i.e., the process of modifying or manipulating text, is a
crucial step in human writing process. In this paper, we study the problem of
controlled text editing by natural language instruction. According to a given
instruction that conveys the edit intention and necessary information, an
original draft text is required to be revised into a target text. Existing
automatically constructed datasets for this task are limited because they do
not have informative natural language instruction. The informativeness requires
the information contained in the instruction to be enough to produce the
revised text. To address this limitation, we build and release WikiIns, a
high-quality controlled text editing dataset with improved informativeness. We
first preprocess the Wikipedia edit history database to extract the raw data
(WikiIns-Raw). Then we crowdsource high-quality validation and test sets, as
well as a small-scale training set (WikiIns-Gold). With the high-quality
annotated dataset, we further propose automatic approaches to generate a
large-scale ``silver'' training set (WikiIns-Silver). Finally, we provide some
insightful analysis on our WikiIns dataset, including the evaluation results
and the edit intention analysis. Our analysis and the experiment results on
WikiIns may assist the ongoing research on text editing. The dataset, source
code and annotation guideline are available at
https://github.com/CasparSwift/WikiIns.
Related papers
- StruEdit: Structured Outputs Enable the Fast and Accurate Knowledge Editing for Large Language Models [41.45831411548188]
StruEdit consistently delivers the highest accuracy with lowest latency compared with other knowledge editing methods.
Results show that StruEdit consistently delivers the highest accuracy with lowest latency compared with other knowledge editing methods.
arXiv Detail & Related papers (2024-09-16T09:48:56Z) - XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates [7.660511135287692]
This paper introduces XATU, the first benchmark specifically designed for fine-grained instruction-based explainable text editing.
XATU considers finer-grained text editing tasks of varying difficulty, incorporating lexical, syntactic, semantic, and knowledge-intensive edit aspects.
We demonstrate the effectiveness of instruction tuning and the impact of underlying architecture across various editing tasks.
arXiv Detail & Related papers (2023-09-20T04:58:59Z) - EditEval: An Instruction-Based Benchmark for Text Improvements [73.5918084416016]
This work presents EditEval: An instruction-based, benchmark and evaluation suite for automatic evaluation of editing capabilities.
We evaluate several pre-trained models, which shows that InstructGPT and PEER perform the best, but that most baselines fall below the supervised SOTA.
Our analysis shows that commonly used metrics for editing tasks do not always correlate well, and that optimization for prompts with the highest performance does not necessarily entail the strongest robustness to different models.
arXiv Detail & Related papers (2022-09-27T12:26:05Z) - WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions
from Paragraphs [66.88232442007062]
We introduce WikiDes, a dataset to generate short descriptions of Wikipedia articles.
The dataset consists of over 80k English samples on 6987 topics.
Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions.
arXiv Detail & Related papers (2022-09-27T01:28:02Z) - Controlling Text Edition by Changing Answers of Specific Questions [44.12998895830244]
We introduce the new task of controllable text edition.
We take as input a long text, a question, and a target answer, and the output is a minimally modified text.
This task is very important in many situations, such as changing some conditions, consequences, or properties in a legal document.
arXiv Detail & Related papers (2021-05-23T20:44:15Z) - Learning Structural Edits via Incremental Tree Transformations [102.64394890816178]
We present a generic model for incremental editing of structured data (i.e., "structural edits")
Our editor learns to iteratively generate tree edits (e.g., deleting or adding a subtree) and applies them to the partially edited data.
We evaluate our proposed editor on two source code edit datasets, where results show that, with the proposed edit encoder, our editor significantly improves accuracy over previous approaches.
arXiv Detail & Related papers (2021-01-28T16:11:32Z) - Text Editing by Command [82.50904226312451]
A prevailing paradigm in neural text generation is one-shot generation, where text is produced in a single step.
We address this limitation with an interactive text generation setting in which the user interacts with the system by issuing commands to edit existing text.
We show that our Interactive Editor, a transformer-based model trained on this dataset, outperforms baselines and obtains positive results in both automatic and human evaluations.
arXiv Detail & Related papers (2020-10-24T08:00:30Z) - Fact-based Text Editing [11.115292572080131]
textscFactEditor edits a draft text by referring to given facts using a buffer, a stream, and a memory.
textscFactEditor conducts inference faster than the encoder-decoder approach.
arXiv Detail & Related papers (2020-07-02T06:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.