Readability Formulas, Systems and LLMs are Poor Predictors of Reading Ease
- URL: http://arxiv.org/abs/2502.11150v4
- Date: Tue, 04 Nov 2025 11:48:41 GMT
- Title: Readability Formulas, Systems and LLMs are Poor Predictors of Reading Ease
- Authors: Keren Gruteke Klein, Shachar Frenkel, Omer Shubi, Yevgeni Berzak,
- Abstract summary: We focus on a fundamental and understudied aspect of readability, real-time reading ease, captured with online reading measures using eye tracking.<n>Applying this evaluation to prominent traditional readability formulas, modern machine learning systems and commercial systems used in education, suggests that they are all poor predictors of reading ease in English.
- Score: 4.868319717279586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Methods for scoring text readability have been studied for over a century, and are widely used in research and in user-facing applications in many domains. Thus far, the development and evaluation of such methods have primarily relied on two types of offline behavioral data, performance on reading comprehension tests and ratings of text readability levels. In this work, we instead focus on a fundamental and understudied aspect of readability, real-time reading ease, captured with online reading measures using eye tracking. We introduce an evaluation framework for readability scoring methods which quantifies their ability to account for reading ease, while controlling for content variation across texts. Applying this evaluation to prominent traditional readability formulas, modern machine learning systems, frontier Large Language Models and commercial systems used in education, suggests that they are all poor predictors of reading ease in English. This outcome holds across native and non-native speakers, reading regimes, and textual units of different lengths. The evaluation further reveals that existing methods are often outperformed by word properties commonly used in psycholinguistics for prediction of reading times. Our results highlight a fundamental limitation of existing approaches to readability scoring, the utility of psycholinguistics for readability research, and the need for new, cognitively driven readability scoring approaches that can better account for reading ease.
Related papers
- Controlling Reading Ease with Gaze-Guided Text Generation [37.556636987304124]
We use a model that predicts human gaze patterns to steer language model outputs towards eliciting certain reading behaviors.<n>We evaluate the approach in an eye-tracking experiment with native and non-native speakers of English.
arXiv Detail & Related papers (2026-01-25T10:42:57Z) - Hierarchical Ranking Neural Network for Long Document Readability Assessment [2.160803573421694]
This paper proposes a bidirectional readability assessment mechanism that captures contextual information to identify regions with rich semantic information in the text.<n>These sentence-level labels are then used to assist in predicting the overall readability level of the document.<n>A pairwise sorting algorithm is introduced to model the ordinal relationship between readability levels through label subtraction.
arXiv Detail & Related papers (2025-11-26T15:05:22Z) - Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection [71.59834293521074]
We develop a framework to distinguish between human-authored and machine-generated text.<n>Our method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset.<n>Code, pretrained weights, and demo will be released.
arXiv Detail & Related papers (2025-10-07T08:14:45Z) - Beyond Stars: Bridging the Gap Between Ratings and Review Sentiment with LLM [0.0]
We present an advanced approach to mobile app review analysis aimed at addressing limitations inherent in traditional star-rating systems.<n>We propose a modular framework leveraging large language models (LLMs) enhanced by structured prompting techniques.<n>Our method quantifies discrepancies between numerical ratings and textual sentiment, extracts detailed, feature-level insights, and supports interactive exploration of reviews.
arXiv Detail & Related papers (2025-09-25T09:39:12Z) - A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document.
Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative.
Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z) - Analysing Zero-Shot Readability-Controlled Sentence Simplification [54.09069745799918]
We investigate how different types of contextual information affect a model's ability to generate sentences with the desired readability.<n>Results show that all tested models struggle to simplify sentences due to models' limitations and characteristics of the source sentences.<n>Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS.
arXiv Detail & Related papers (2024-09-30T12:36:25Z) - Free-text Rationale Generation under Readability Level Control [6.338124510580766]
We investigate how large language models (LLMs) perform rationale generation under the effects of readability level control.
We find that explanations are adaptable to such instruction, though the requested readability is often misaligned with the measured text complexity.
Our human annotators confirm a generally satisfactory impression on rationales at all readability levels.
arXiv Detail & Related papers (2024-07-01T15:34:17Z) - Attention-aware semantic relevance predicting Chinese sentence reading [6.294658916880712]
This study proposes an attention-aware'' approach for computing contextual semantic relevance.
The attention-aware metrics of semantic relevance can more accurately predict fixation durations in Chinese reading tasks.
Our approach underscores the potential of these metrics to advance our comprehension of how humans understand and process language.
arXiv Detail & Related papers (2024-03-27T13:22:38Z) - Digital Comprehensibility Assessment of Simplified Texts among Persons
with Intellectual Disabilities [2.446971913303003]
We conducted an evaluation study of text comprehensibility including participants with and without intellectual disabilities reading German texts on a tablet computer.
We explored four different approaches to measuring comprehensibility: multiple-choice comprehension questions, perceived difficulty ratings, response time, and reading speed.
For the target group of persons with intellectual disabilities, comprehension questions emerged as the most reliable measure, while analyzing reading speed provided valuable insights into participants' reading behavior.
arXiv Detail & Related papers (2024-02-20T15:37:08Z) - Previously on the Stories: Recap Snippet Identification for Story
Reading [51.641565531840186]
We propose the first benchmark on this useful task called Recap Snippet Identification with a hand-crafted evaluation dataset.
Our experiments show that the proposed task is challenging to PLMs, LLMs, and proposed methods as the task requires a deep understanding of the plot correlation between snippets.
arXiv Detail & Related papers (2024-02-11T18:27:14Z) - Improving Factual Consistency of News Summarization by Contrastive Preference Optimization [65.11227166319546]
Large language models (LLMs) generate summaries that are factually inconsistent with original articles.<n>These hallucinations are challenging to detect through traditional methods.<n>We propose Contrastive Preference Optimization (CPO) to disentangle the LLMs' propensities to generate faithful and fake content.
arXiv Detail & Related papers (2023-10-30T08:40:16Z) - Information Value: Measuring Utterance Predictability as Distance from
Plausible Alternatives [4.446323294830542]
We present information value, a measure which quantifies the predictability of an utterance relative to a set of plausible alternatives.
We exploit their psychometric predictive power to investigate the dimensions of predictability that drive human comprehension behaviour.
arXiv Detail & Related papers (2023-10-20T17:25:36Z) - Generating Summaries with Controllable Readability Levels [67.34087272813821]
Several factors affect the readability level, such as the complexity of the text, its subject matter, and the reader's background knowledge.
Current text generation approaches lack refined control, resulting in texts that are not customized to readers' proficiency levels.
We develop three text generation techniques for controlling readability: instruction-based readability control, reinforcement learning to minimize the gap between requested and observed readability, and a decoding approach that uses look-ahead to estimate the readability of upcoming decoding steps.
arXiv Detail & Related papers (2023-10-16T17:46:26Z) - LC-Score: Reference-less estimation of Text Comprehension Difficulty [0.0]
We present textscLC-Score, a simple approach for training text comprehension metric for any French text without reference.
Our objective is to quantitatively capture the extend to which a text suits to the textitLangage Clair (LC, textitClear Language) guidelines.
We explore two approaches: (i) using linguistically motivated indicators used to train statistical models, and (ii) neural learning directly from text leveraging pre-trained language models.
arXiv Detail & Related papers (2023-10-04T11:49:37Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - A Framework for Learning Assessment through Multimodal Analysis of
Reading Behaviour and Language Comprehension [0.0]
This dissertation shows how different skills could be measured and scored automatically.
We also demonstrate, using example experiments on multiple forms of learners' responses, how frequent reading practices could impact on the variables of multimodal skills.
arXiv Detail & Related papers (2021-10-22T17:48:03Z) - Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers [0.05857406612420462]
Large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks.
We propose evaluating systems through a novel measure of prediction coherence.
arXiv Detail & Related papers (2021-09-10T15:04:23Z) - Readability Research: An Interdisciplinary Approach [62.03595526230364]
We aim to provide a firm foundation for readability research, a comprehensive framework for readability research.
Readability refers to aspects of visual information design which impact information flow from the page to the reader.
These aspects can be modified on-demand, instantly improving the ease with which a reader can process and derive meaning from text.
arXiv Detail & Related papers (2021-07-20T16:52:17Z) - On the Interpretability and Significance of Bias Metrics in Texts: a
PMI-based Approach [3.2326259807823026]
We analyze an alternative PMI-based metric to quantify biases in texts.
It can be expressed as a function of conditional probabilities, which provides a simple interpretation in terms of word co-occurrences.
arXiv Detail & Related papers (2021-04-13T19:34:17Z) - Interactive Fiction Game Playing as Multi-Paragraph Reading
Comprehension with Reinforcement Learning [94.50608198582636]
Interactive Fiction (IF) games with real human-written natural language texts provide a new natural evaluation for language understanding techniques.
We take a novel perspective of IF game solving and re-formulate it as Multi-Passage Reading (MPRC) tasks.
arXiv Detail & Related papers (2020-10-05T23:09:20Z) - Improving Machine Reading Comprehension with Contextualized Commonsense
Knowledge [62.46091695615262]
We aim to extract commonsense knowledge to improve machine reading comprehension.
We propose to represent relations implicitly by situating structured knowledge in a context.
We employ a teacher-student paradigm to inject multiple types of contextualized knowledge into a student machine reader.
arXiv Detail & Related papers (2020-09-12T17:20:01Z) - Salience Estimation with Multi-Attention Learning for Abstractive Text
Summarization [86.45110800123216]
In the task of text summarization, salience estimation for words, phrases or sentences is a critical component.
We propose a Multi-Attention Learning framework which contains two new attention learning components for salience estimation.
arXiv Detail & Related papers (2020-04-07T02:38:56Z) - ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine
Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets.
The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z) - Text as Environment: A Deep Reinforcement Learning Text Readability
Assessment Model [2.826553192869411]
The efficiency of state-of-the-art text readability assessment models can be further improved using deep reinforcement learning models.
A comparison of the model on Weebit and Cambridge Exams with state-of-the-art models, such as the BERT text readability model, shows that it is capable of achieving state-of-the-art accuracy with a significantly smaller amount of input text than other models.
arXiv Detail & Related papers (2019-12-12T13:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.