Simple is not Enough: Document-level Text Simplification using Readability and Coherence
- URL: http://arxiv.org/abs/2412.18655v1
- Date: Tue, 24 Dec 2024 19:05:21 GMT
- Title: Simple is not Enough: Document-level Text Simplification using Readability and Coherence
- Authors: Laura Vásquez-Rodríguez, Nhung T. H. Nguyen, Piotr Przybyła, Matthew Shardlow, Sophia Ananiadou,
- Abstract summary: We present the SimDoc system, a simplification model considering simplicity, readability, and discourse aspects, such as coherence.
We include multiple objectives during training, considering simplicity, readability, and coherence altogether.
We present a comparative analysis in which we evaluate our proposed models in a zero-shot, few-shot, and fine-tuning setting using document-level TS corpora.
- Score: 20.613410797137036
- License:
- Abstract: In this paper, we present the SimDoc system, a simplification model considering simplicity, readability, and discourse aspects, such as coherence. In the past decade, the progress of the Text Simplification (TS) field has been mostly shown at a sentence level, rather than considering paragraphs or documents, a setting from which most TS audiences would benefit. We propose a simplification system that is initially fine-tuned with professionally created corpora. Further, we include multiple objectives during training, considering simplicity, readability, and coherence altogether. Our contributions include the extension of professionally annotated simplification corpora by the association of existing annotations into (complex text, simple text, readability label) triples to benefit from readability during training. Also, we present a comparative analysis in which we evaluate our proposed models in a zero-shot, few-shot, and fine-tuning setting using document-level TS corpora, demonstrating novel methods for simplification. Finally, we show a detailed analysis of outputs, highlighting the difficulties of simplification at a document level.
Related papers
- Progressive Document-level Text Simplification via Large Language Models [19.57555397986868]
Long document-level simplification (DS) is still relatively unexplored.
We propose a progressive simplification method (ProgDS) by hierarchically decomposing the task.
arXiv Detail & Related papers (2025-01-07T15:14:37Z) - Analysing Zero-Shot Readability-Controlled Sentence Simplification [54.09069745799918]
We investigate how different types of contextual information affect a model's ability to generate sentences with the desired readability.
Results show that all tested models struggle to simplify sentences due to models' limitations and characteristics of the source sentences.
Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS.
arXiv Detail & Related papers (2024-09-30T12:36:25Z) - Controlling Pre-trained Language Models for Grade-Specific Text
Simplification [22.154454849167077]
We study how different control mechanisms impact the adequacy and simplicity of text simplification systems.
We introduce a simple method that predicts the edit operations required for simplifying a text for a specific grade level on an instance-per-instance basis.
arXiv Detail & Related papers (2023-05-24T10:29:45Z) - Elaborative Simplification as Implicit Questions Under Discussion [51.17933943734872]
This paper proposes to view elaborative simplification through the lens of the Question Under Discussion (QUD) framework.
We show that explicitly modeling QUD provides essential understanding of elaborative simplification and how the elaborations connect with the rest of the discourse.
arXiv Detail & Related papers (2023-05-17T17:26:16Z) - Context-Aware Document Simplification [3.2880869992413237]
We explore systems that use document context within the simplification process itself.
We achieve state-of-the-art performance on the document simplification task, even when not relying on plan-guidance.
arXiv Detail & Related papers (2023-05-10T16:06:36Z) - NapSS: Paragraph-level Medical Text Simplification via Narrative
Prompting and Sentence-matching Summarization [46.772517928718216]
We propose a summarize-then-simplify two-stage strategy, which we call NapSS.
NapSS identifies the relevant content to simplify while ensuring that the original narrative flow is preserved.
Our model achieves significantly better than the seq2seq baseline on an English medical corpus.
arXiv Detail & Related papers (2023-02-11T02:20:25Z) - Unsupervised Sentence Simplification via Dependency Parsing [4.337513096197002]
We propose a simple yet novel unsupervised sentence simplification system.
It harnesses parsing structures together with sentence embeddings to produce linguistically effective simplifications.
We establish the unsupervised state-of-the-art at 39.13 SARI on TurkCorpus set and perform competitively against supervised baselines on various quality metrics.
arXiv Detail & Related papers (2022-06-10T07:55:25Z) - Document-Level Text Simplification: Dataset, Criteria and Baseline [75.58761130635824]
We define and investigate a new task of document-level text simplification.
Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia.
We propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task.
arXiv Detail & Related papers (2021-10-11T08:15:31Z) - Controllable Text Simplification with Explicit Paraphrasing [88.02804405275785]
Text Simplification improves the readability of sentences through several rewriting transformations, such as lexical paraphrasing, deletion, and splitting.
Current simplification systems are predominantly sequence-to-sequence models that are trained end-to-end to perform all these operations simultaneously.
We propose a novel hybrid approach that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles.
arXiv Detail & Related papers (2020-10-21T13:44:40Z) - Elaborative Simplification: Content Addition and Explanation Generation
in Text Simplification [33.08519864889526]
We present the first data-driven study of content addition in text simplification.
We analyze how entities, ideas, and concepts are elaborated through the lens of contextual specificity.
Our results illustrate the complexities of elaborative simplification, suggesting many interesting directions for future work.
arXiv Detail & Related papers (2020-10-20T05:06:23Z) - ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English.
We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.