Related papers: Controlling Pre-trained Language Models for Grade-Specific Text Simplification

Controlling Pre-trained Language Models for Grade-Specific Text Simplification

URL: http://arxiv.org/abs/2305.14993v2
Date: Thu, 30 Nov 2023 14:14:31 GMT
Title: Controlling Pre-trained Language Models for Grade-Specific Text Simplification
Authors: Sweta Agrawal and Marine Carpuat
Abstract summary: We study how different control mechanisms impact the adequacy and simplicity of text simplification systems. We introduce a simple method that predicts the edit operations required for simplifying a text for a specific grade level on an instance-per-instance basis.
Score: 22.154454849167077
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text simplification (TS) systems rewrite text to make it more readable while preserving its content. However, what makes a text easy to read depends on the intended readers. Recent work has shown that pre-trained language models can simplify text using a wealth of techniques to control output simplicity, ranging from specifying only the desired reading grade level, to directly specifying low-level edit operations. Yet it remains unclear how to set these control parameters in practice. Existing approaches set them at the corpus level, disregarding the complexity of individual inputs and considering only one level of output complexity. In this work, we conduct an empirical study to understand how different control mechanisms impact the adequacy and simplicity of text simplification systems. Based on these insights, we introduce a simple method that predicts the edit operations required for simplifying a text for a specific grade level on an instance-per-instance basis. This approach improves the quality of the simplified outputs over corpus-level search-based heuristics.

Related papers

Simple is not Enough: Document-level Text Simplification using Readability and Coherence [20.613410797137036]
We present the SimDoc system, a simplification model considering simplicity, readability, and discourse aspects, such as coherence. We include multiple objectives during training, considering simplicity, readability, and coherence altogether. We present a comparative analysis in which we evaluate our proposed models in a zero-shot, few-shot, and fine-tuning setting using document-level TS corpora.
arXiv Detail & Related papers (2024-12-24T19:05:21Z)
Automating Easy Read Text Segmentation [2.7309692684728617]
Easy Read text is one of the main forms of access to information for people with reading difficulties. One of the key characteristics of this type of text is the requirement to split sentences into smaller grammatical segments. We study novel methods for the task, leveraging masked and generative language models, along with constituent parsing.
arXiv Detail & Related papers (2024-06-17T12:25:25Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification [59.625179404482594]
Randomly masking text spans in ordinary texts in the pre-training stage hardly allows models to acquire the ability to generate simple texts. We propose a new continued pre-training strategy to teach the pre-trained model to generate simple texts.
arXiv Detail & Related papers (2023-05-21T14:03:49Z)
Composable Text Controls in Latent Space with ODEs [97.12426987887021]
This paper proposes a new efficient approach for composable text operations in the compact latent space of text. By connecting pretrained LMs to the latent space through efficient adaption, we then decode the sampled vectors into desired text sequences. Experiments show that composing those operators within our approach manages to generate or edit high-quality text.
arXiv Detail & Related papers (2022-08-01T06:51:45Z)
Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts. The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z)
Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems. We present an iterative in-place editing approach for text revision, which requires no parallel data. It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z)
Text Simplification for Comprehension-based Question-Answering [7.144235435987265]
We release Simple-SQuAD, a simplified version of the widely-used SQuAD dataset. We benchmark the newly created corpus and perform an ablation study for examining the effect of the simplification process in the SQuAD-based question answering task.
arXiv Detail & Related papers (2021-09-28T18:48:00Z)
CUT: Controllable Unsupervised Text Simplification [0.0]
We propose two unsupervised mechanisms for controlling the output complexity of generated texts. We show that by nudging a back-translation algorithm to understand the relative simplicity of a text in comparison to its noisy translation, the algorithm self-supervises itself to produce the desired complexity.
arXiv Detail & Related papers (2020-12-03T14:14:30Z)
Explainable Prediction of Text Complexity: The Missing Preliminaries for Text Simplification [13.447565774887215]
Text simplification reduces the language complexity of professional content for accessibility purposes. End-to-end neural network models have been widely adopted to directly generate the simplified version of input text. We show that text simplification can be decomposed into a compact pipeline of tasks to ensure the transparency and explainability of the process.
arXiv Detail & Related papers (2020-07-31T03:33:37Z)
ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English. We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.