LC-Score: Reference-less estimation of Text Comprehension Difficulty
- URL: http://arxiv.org/abs/2310.02754v2
- Date: Thu, 5 Oct 2023 14:28:20 GMT
- Title: LC-Score: Reference-less estimation of Text Comprehension Difficulty
- Authors: Paul Tardy, Charlotte Roze, Paul Poupet
- Abstract summary: We present textscLC-Score, a simple approach for training text comprehension metric for any French text without reference.
Our objective is to quantitatively capture the extend to which a text suits to the textitLangage Clair (LC, textitClear Language) guidelines.
We explore two approaches: (i) using linguistically motivated indicators used to train statistical models, and (ii) neural learning directly from text leveraging pre-trained language models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Being able to read and understand written text is critical in a digital era.
However, studies shows that a large fraction of the population experiences
comprehension issues. In this context, further initiatives in accessibility are
required to improve the audience text comprehension. However, writers are
hardly assisted nor encouraged to produce easy-to-understand content. Moreover,
Automatic Text Simplification (ATS) model development suffers from the lack of
metric to accurately estimate comprehension difficulty We present
\textsc{LC-Score}, a simple approach for training text comprehension metric for
any French text without reference \ie predicting how easy to understand a given
text is on a $[0, 100]$ scale. Our objective with this scale is to
quantitatively capture the extend to which a text suits to the \textit{Langage
Clair} (LC, \textit{Clear Language}) guidelines, a French initiative closely
related to English Plain Language. We explore two approaches: (i) using
linguistically motivated indicators used to train statistical models, and (ii)
neural learning directly from text leveraging pre-trained language models. We
introduce a simple proxy task for comprehension difficulty training as a
classification task. To evaluate our models, we run two distinct human
annotation experiments, and find that both approaches (indicator based and
neural) outperforms commonly used readability and comprehension metrics such as
FKGL.
Related papers
- Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models [58.952782707682815]
COFT is a novel method to focus on different-level key texts, thereby avoiding getting lost in lengthy contexts.
Experiments on the knowledge hallucination benchmark demonstrate the effectiveness of COFT, leading to a superior performance over $30%$ in the F1 score metric.
arXiv Detail & Related papers (2024-10-19T13:59:48Z) - Difficulty Estimation and Simplification of French Text Using LLMs [1.0568851068989973]
We leverage large language models for language learning applications, focusing on estimating the difficulty of foreign language texts.
We develop a difficulty classification model using labeled examples, transfer learning, and large language models, demonstrating superior accuracy compared to previous approaches.
Our experiments are conducted on French texts, but our methods are language-agnostic and directly applicable to other foreign languages.
arXiv Detail & Related papers (2024-07-25T14:16:08Z) - Automating Easy Read Text Segmentation [2.7309692684728617]
Easy Read text is one of the main forms of access to information for people with reading difficulties.
One of the key characteristics of this type of text is the requirement to split sentences into smaller grammatical segments.
We study novel methods for the task, leveraging masked and generative language models, along with constituent parsing.
arXiv Detail & Related papers (2024-06-17T12:25:25Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - ChatPRCS: A Personalized Support System for English Reading
Comprehension based on ChatGPT [3.847982502219679]
This paper presents a novel personalized support system for reading comprehension, referred to as ChatPRCS.
ChatPRCS employs methods including reading comprehension proficiency prediction, question generation, and automatic evaluation.
arXiv Detail & Related papers (2023-09-22T11:46:44Z) - Learning Symbolic Rules over Abstract Meaning Representations for
Textual Reinforcement Learning [63.148199057487226]
We propose a modular, NEuroSymbolic Textual Agent (NESTA) that combines a generic semantic generalization with a rule induction system to learn interpretable rules as policies.
Our experiments show that the proposed NESTA method outperforms deep reinforcement learning-based techniques by achieving better to unseen test games and learning from fewer training interactions.
arXiv Detail & Related papers (2023-07-05T23:21:05Z) - Prompt-based Learning for Text Readability Assessment [0.4757470449749875]
We propose the novel adaptation of a pre-trained seq2seq model for readability assessment.
We prove that a seq2seq model can be adapted to discern which text is more difficult from two given texts (pairwise)
arXiv Detail & Related papers (2023-02-25T18:39:59Z) - Language Matters: A Weakly Supervised Pre-training Approach for Scene
Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations.
Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features.
Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - Narrative Incoherence Detection [76.43894977558811]
We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding.
Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow.
arXiv Detail & Related papers (2020-12-21T07:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.