Context-Aware Document Simplification
- URL: http://arxiv.org/abs/2305.06274v1
- Date: Wed, 10 May 2023 16:06:36 GMT
- Title: Context-Aware Document Simplification
- Authors: Liam Cripwell, Jo\"el Legrand, Claire Gardent
- Abstract summary: We explore systems that use document context within the simplification process itself.
We achieve state-of-the-art performance on the document simplification task, even when not relying on plan-guidance.
- Score: 3.2880869992413237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To date, most work on text simplification has focused on sentence-level
inputs. Early attempts at document simplification merely applied these
approaches iteratively over the sentences of a document. However, this fails to
coherently preserve the discourse structure, leading to suboptimal output
quality. Recently, strategies from controllable simplification have been
leveraged to achieve state-of-the-art results on document simplification by
first generating a document-level plan (a sequence of sentence-level
simplification operations) and using this plan to guide sentence-level
simplification downstream. However, this is still limited in that the
simplification model has no direct access to the local inter-sentence document
context, likely having a negative impact on surface realisation. We explore
various systems that use document context within the simplification process
itself, either by iterating over larger text units or by extending the system
architecture to attend over a high-level representation of document context. In
doing so, we achieve state-of-the-art performance on the document
simplification task, even when not relying on plan-guidance. Further, we
investigate the performance and efficiency tradeoffs of system variants and
make suggestions of when each should be preferred.
Related papers
- Controlling Pre-trained Language Models for Grade-Specific Text
Simplification [22.154454849167077]
We study how different control mechanisms impact the adequacy and simplicity of text simplification systems.
We introduce a simple method that predicts the edit operations required for simplifying a text for a specific grade level on an instance-per-instance basis.
arXiv Detail & Related papers (2023-05-24T10:29:45Z) - Unsupervised Sentence Simplification via Dependency Parsing [4.337513096197002]
We propose a simple yet novel unsupervised sentence simplification system.
It harnesses parsing structures together with sentence embeddings to produce linguistically effective simplifications.
We establish the unsupervised state-of-the-art at 39.13 SARI on TurkCorpus set and perform competitively against supervised baselines on various quality metrics.
arXiv Detail & Related papers (2022-06-10T07:55:25Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems.
We present an iterative in-place editing approach for text revision, which requires no parallel data.
It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z) - Document-Level Text Simplification: Dataset, Criteria and Baseline [75.58761130635824]
We define and investigate a new task of document-level text simplification.
Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia.
We propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task.
arXiv Detail & Related papers (2021-10-11T08:15:31Z) - Text Simplification for Comprehension-based Question-Answering [7.144235435987265]
We release Simple-SQuAD, a simplified version of the widely-used SQuAD dataset.
We benchmark the newly created corpus and perform an ablation study for examining the effect of the simplification process in the SQuAD-based question answering task.
arXiv Detail & Related papers (2021-09-28T18:48:00Z) - Controllable Text Simplification with Explicit Paraphrasing [88.02804405275785]
Text Simplification improves the readability of sentences through several rewriting transformations, such as lexical paraphrasing, deletion, and splitting.
Current simplification systems are predominantly sequence-to-sequence models that are trained end-to-end to perform all these operations simultaneously.
We propose a novel hybrid approach that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles.
arXiv Detail & Related papers (2020-10-21T13:44:40Z) - Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level.
We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z) - ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English.
We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.