Related papers: Automating Easy Read Text Segmentation

Automating Easy Read Text Segmentation

URL: http://arxiv.org/abs/2406.11464v1
Date: Mon, 17 Jun 2024 12:25:25 GMT
Title: Automating Easy Read Text Segmentation
Authors: Jesús Calleja, Thierry Etchegoyhen, David Ponce,
Abstract summary: Easy Read text is one of the main forms of access to information for people with reading difficulties. One of the key characteristics of this type of text is the requirement to split sentences into smaller grammatical segments. We study novel methods for the task, leveraging masked and generative language models, along with constituent parsing.
Score: 2.7309692684728617
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Easy Read text is one of the main forms of access to information for people with reading difficulties. One of the key characteristics of this type of text is the requirement to split sentences into smaller grammatical segments, to facilitate reading. Automated segmentation methods could foster the creation of Easy Read content, but their viability has yet to be addressed. In this work, we study novel methods for the task, leveraging masked and generative language models, along with constituent parsing. We conduct comprehensive automatic and human evaluations in three languages, analysing the strengths and weaknesses of the proposed alternatives, under scarce resource limitations. Our results highlight the viability of automated ER segmentation and remaining deficiencies compared to expert-driven human segmentation.

Related papers

Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts. We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z)
PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery [29.301950609839796]
We propose a novel framework that leverages a part discovery module based on slot attention to autonomously identify and align distinctive parts across modalities. Our method is evaluated on three public benchmarks, significantly outperforming existing methods.
arXiv Detail & Related papers (2024-09-20T13:05:55Z)
Exploring Large Language Models to generate Easy to Read content [1.474723404975345]
Easy to Read and Plain Language guidelines aim to simplify complex texts. Standardizing these guidelines remains challenging and often involves manual processes. This work presents an exploratory investigation into leveraging Artificial Intelligence (AI) and Natural Language Processing (NLP) approaches to systematically simplify Spanish texts into Easy to Read formats.
arXiv Detail & Related papers (2024-07-29T14:30:39Z)
From Text Segmentation to Smart Chaptering: A Novel Benchmark for Structuring Video Transcriptions [63.11097464396147]
We introduce a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse. We also introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-27T15:59:37Z)
Digital Comprehensibility Assessment of Simplified Texts among Persons with Intellectual Disabilities [2.446971913303003]
We conducted an evaluation study of text comprehensibility including participants with and without intellectual disabilities reading German texts on a tablet computer. We explored four different approaches to measuring comprehensibility: multiple-choice comprehension questions, perceived difficulty ratings, response time, and reading speed. For the target group of persons with intellectual disabilities, comprehension questions emerged as the most reliable measure, while analyzing reading speed provided valuable insights into participants' reading behavior.
arXiv Detail & Related papers (2024-02-20T15:37:08Z)
Generating Summaries with Controllable Readability Levels [67.34087272813821]
Several factors affect the readability level, such as the complexity of the text, its subject matter, and the reader's background knowledge. Current text generation approaches lack refined control, resulting in texts that are not customized to readers' proficiency levels. We develop three text generation techniques for controlling readability: instruction-based readability control, reinforcement learning to minimize the gap between requested and observed readability, and a decoding approach that uses look-ahead to estimate the readability of upcoming decoding steps.
arXiv Detail & Related papers (2023-10-16T17:46:26Z)
SeqXGPT: Sentence-Level AI-Generated Text Detection [62.3792779440284]
We introduce a sentence-level detection challenge by synthesizing documents polished with large language models (LLMs) We then propose textbfSequence textbfX (Check) textbfGPT, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection.
arXiv Detail & Related papers (2023-10-13T07:18:53Z)
LC-Score: Reference-less estimation of Text Comprehension Difficulty [0.0]
We present textscLC-Score, a simple approach for training text comprehension metric for any French text without reference. Our objective is to quantitatively capture the extend to which a text suits to the textitLangage Clair (LC, textitClear Language) guidelines. We explore two approaches: (i) using linguistically motivated indicators used to train statistical models, and (ii) neural learning directly from text leveraging pre-trained language models.
arXiv Detail & Related papers (2023-10-04T11:49:37Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
Controlling Pre-trained Language Models for Grade-Specific Text Simplification [22.154454849167077]
We study how different control mechanisms impact the adequacy and simplicity of text simplification systems. We introduce a simple method that predicts the edit operations required for simplifying a text for a specific grade level on an instance-per-instance basis.
arXiv Detail & Related papers (2023-05-24T10:29:45Z)
Narrative Incoherence Detection [76.43894977558811]
We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding. Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow.
arXiv Detail & Related papers (2020-12-21T07:18:08Z)
Enabling Language Models to Fill in the Blanks [81.59381915581892]
We present a simple approach for text infilling, the task of predicting missing spans of text at any position in a document. We train (or fine-tune) off-the-shelf language models on sequences containing the concatenation of artificially-masked text and the text which was masked. We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics.
arXiv Detail & Related papers (2020-05-11T18:00:03Z)
TextScanner: Reading Characters in Order for Robust Scene Text Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition. It generates pixel-wise, multi-channel segmentation maps for character class, position and order. It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.