A Computational Analysis of Vagueness in Revisions of Instructional
Texts
- URL: http://arxiv.org/abs/2309.12107v1
- Date: Thu, 21 Sep 2023 14:26:04 GMT
- Title: A Computational Analysis of Vagueness in Revisions of Instructional
Texts
- Authors: Alok Debnath, Michael Roth
- Abstract summary: We extract pairwise versions of an instruction before and after a revision was made.
We investigate the ability of a neural model to distinguish between two versions of an instruction in our data.
- Score: 2.2577978123177536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: WikiHow is an open-domain repository of instructional articles for a variety
of tasks, which can be revised by users. In this paper, we extract pairwise
versions of an instruction before and after a revision was made. Starting from
a noisy dataset of revision histories, we specifically extract and analyze
edits that involve cases of vagueness in instructions. We further investigate
the ability of a neural model to distinguish between two versions of an
instruction in our data by adopting a pairwise ranking task from previous work
and showing improvements over existing baselines.
Related papers
- Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation.
Our approach can be applied to existing datasets by automatically generating hard negative test captions.
Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - To Revise or Not to Revise: Learning to Detect Improvable Claims for
Argumentative Writing Support [20.905660642919052]
We explore the main challenges to identifying argumentative claims in need of specific revisions.
We propose a new sampling strategy based on revision distance.
We provide evidence that using contextual information and domain knowledge can further improve prediction results.
arXiv Detail & Related papers (2023-05-26T10:19:54Z) - EditEval: An Instruction-Based Benchmark for Text Improvements [73.5918084416016]
This work presents EditEval: An instruction-based, benchmark and evaluation suite for automatic evaluation of editing capabilities.
We evaluate several pre-trained models, which shows that InstructGPT and PEER perform the best, but that most baselines fall below the supervised SOTA.
Our analysis shows that commonly used metrics for editing tasks do not always correlate well, and that optimization for prompts with the highest performance does not necessarily entail the strongest robustness to different models.
arXiv Detail & Related papers (2022-09-27T12:26:05Z) - Understanding Iterative Revision from Human-Written Text [10.714872525208385]
IteraTeR is the first large-scale, multi-domain, edit-intention annotated corpus of iteratively revised text.
We better understand the text revision process, making vital connections between edit intentions and writing quality.
arXiv Detail & Related papers (2022-03-08T01:47:42Z) - Aspect-Controllable Opinion Summarization [58.5308638148329]
We propose an approach that allows the generation of customized summaries based on aspect queries.
Using a review corpus, we create a synthetic training dataset of (review, summary) pairs enriched with aspect controllers.
We fine-tune a pretrained model using our synthetic dataset and generate aspect-specific summaries by modifying the aspect controllers.
arXiv Detail & Related papers (2021-09-07T16:09:17Z) - WikiAsp: A Dataset for Multi-domain Aspect-based Summarization [69.13865812754058]
We propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization.
Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation.
Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.
arXiv Detail & Related papers (2020-11-16T10:02:52Z) - From Dataset Recycling to Multi-Property Extraction and Beyond [7.670897251425096]
This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading dataset.
The proposed dual-source model outperforms the current state-of-the-art by a large margin.
We introduce WikiReading Recycled-a newly developed public dataset and the task of multiple property extraction.
arXiv Detail & Related papers (2020-11-06T08:22:12Z) - Learning to Update Natural Language Comments Based on Code Changes [48.829941738578086]
We formulate the novel task of automatically updating an existing natural language comment based on changes in the body of code it accompanies.
We propose an approach that learns to correlate changes across two distinct language representations, to generate a sequence of edits that are applied to the existing comment to reflect the source code modifications.
arXiv Detail & Related papers (2020-04-25T15:37:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.