Related papers: Optimizing Readability Using Genetic Algorithms

Optimizing Readability Using Genetic Algorithms

URL: http://arxiv.org/abs/2301.00374v1
Date: Sun, 1 Jan 2023 09:08:45 GMT
Title: Optimizing Readability Using Genetic Algorithms
Authors: Jorge Martinez-Gil
Abstract summary: This research presents ORUGA, a method that tries to automatically optimize the readability of any text in English. The core idea behind the method is that certain factors affect the readability of a text, some of which are quantifiable. In addition, this research seeks to preserve both the original text's content and form through multi-objective optimization techniques.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This research presents ORUGA, a method that tries to automatically optimize the readability of any text in English. The core idea behind the method is that certain factors affect the readability of a text, some of which are quantifiable (number of words, syllables, presence or absence of adverbs, and so on). The nature of these factors allows us to implement a genetic learning strategy to replace some existing words with their most suitable synonyms to facilitate optimization. In addition, this research seeks to preserve both the original text's content and form through multi-objective optimization techniques. In this way, neither the text's syntactic structure nor the semantic content of the original message is significantly distorted. An exhaustive study on a substantial number and diversity of texts confirms that our method was able to optimize the degree of readability in all cases without significantly altering their form or meaning. The source code of this approach is available at https://github.com/jorge-martinez-gil/oruga.

Related papers

MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation [13.70446799743065]
Byte-based machine translation systems have shown significant potential in massively multilingual settings. Unicode encoding, which maps each character to specific byte(s), eliminates the emergence of unknown words, even in new languages. Local contextualization has proven effective in assigning initial semantics to tokens, improving sentence comprehension. We propose Adaptive MultiScale-Headed Attention (Ada-MSHA), adaptively selecting and mixing attention heads, which are treated as contextualization experts.
arXiv Detail & Related papers (2024-11-03T08:15:43Z)
Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects. In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts. We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z)
Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection. Our approach achieves better generation quality according to both automatic and human evaluations. Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z)
Textual Entailment Recognition with Semantic Features from Empirical Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text. In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis. We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z)
Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems. We present an iterative in-place editing approach for text revision, which requires no parallel data. It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z)
Text Counterfactuals via Latent Optimization and Shapley-Guided Search [15.919650185010491]
We study the problem of generating counterfactual text for a classification model. We aim to minimally alter the text to change the model's prediction. White-box approaches have been successfully applied to similar problems in vision.
arXiv Detail & Related papers (2021-10-22T05:04:40Z)
Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models. Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z)
Text Guide: Improving the quality of long text classification by a text selection method based on feature importance [0.0]
We propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit. We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification.
arXiv Detail & Related papers (2021-04-15T04:10:08Z)
Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks. Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it. In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.