LLM-based Text Simplification and its Effect on User Comprehension and Cognitive Load
- URL: http://arxiv.org/abs/2505.01980v1
- Date: Sun, 04 May 2025 04:00:53 GMT
- Title: LLM-based Text Simplification and its Effect on User Comprehension and Cognitive Load
- Authors: Theo Guidroz, Diego Ardila, Jimmy Li, Adam Mansour, Paul Jhun, Nina Gonzalez, Xiang Ji, Mike Sanchez, Sujay Kakarmath, Mathias MJ Bellaiche, Miguel Ángel Garrido, Faruk Ahmed, Divyansh Choudhary, Jay Hartford, Chenwei Xu, Henry Javier Serrano Echeverria, Yifan Wang, Jeff Shaffer, Eric, Cao, Yossi Matias, Avinatan Hassidim, Dale R Webster, Yun Liu, Sho Fujiwara, Peggy Bui, Quang Duong,
- Abstract summary: We used a self-refinement approach to develop a LLM capability for minimally lossy text simplification.<n>We conducted a randomized study involving 4563 participants and 31 texts spanning 6 broad subject areas.
- Score: 19.637267914743653
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Information on the web, such as scientific publications and Wikipedia, often surpasses users' reading level. To help address this, we used a self-refinement approach to develop a LLM capability for minimally lossy text simplification. To validate our approach, we conducted a randomized study involving 4563 participants and 31 texts spanning 6 broad subject areas: PubMed (biomedical scientific articles), biology, law, finance, literature/philosophy, and aerospace/computer science. Participants were randomized to viewing original or simplified texts in a subject area, and answered multiple-choice questions (MCQs) that tested their comprehension of the text. The participants were also asked to provide qualitative feedback such as task difficulty. Our results indicate that participants who read the simplified text answered more MCQs correctly than their counterparts who read the original text (3.9% absolute increase, p<0.05). This gain was most striking with PubMed (14.6%), while more moderate gains were observed for finance (5.5%), aerospace/computer science (3.8%) domains, and legal (3.5%). Notably, the results were robust to whether participants could refer back to the text while answering MCQs. The absolute accuracy decreased by up to ~9% for both original and simplified setups where participants could not refer back to the text, but the ~4% overall improvement persisted. Finally, participants' self-reported perceived ease based on a simplified NASA Task Load Index was greater for those who read the simplified text (absolute change on a 5-point scale 0.33, p<0.05). This randomized study, involving an order of magnitude more participants than prior works, demonstrates the potential of LLMs to make complex information easier to understand. Our work aims to enable a broader audience to better learn and make use of expert knowledge available on the web, improving information accessibility.
Related papers
- Reading Between the Lines: A dataset and a study on why some texts are tougher than others [0.20482269513546458]
Our research aims at better understanding what makes a text difficult to read for specific audiences with intellectual disabilities.<n>We introduce a scheme for the annotation of difficulties based on empirical research in psychology.<n>We fine-tuned four different pre-trained transformer models to perform the task of multiclass classification.
arXiv Detail & Related papers (2025-01-03T13:09:46Z) - DRS: Deep Question Reformulation With Structured Output [114.14122339938697]
Large language models (LLMs) can detect unanswerable questions, but struggle to assist users in reformulating these questions.<n>We propose DRS: Deep Question Reformulation with Structured Output, a novel zero-shot method aimed at enhancing LLMs ability to assist users in reformulating questions.<n>We show that DRS improves the reformulation accuracy of GPT-3.5 from 23.03% to 70.42%, while also enhancing the performance of open-source models, such as Gemma2-9B, from 26.35% to 56.75%.
arXiv Detail & Related papers (2024-11-27T02:20:44Z) - Evaluating LLMs for Targeted Concept Simplification for Domain-Specific Texts [53.421616210871704]
Lack of context and unfamiliarity with difficult concepts is a major reason for adult readers' difficulty with domain-specific text.
We introduce "targeted concept simplification," a simplification task for rewriting text to help readers comprehend text containing unfamiliar concepts.
We benchmark the performance of open-source and commercial LLMs and a simple dictionary baseline on this task.
arXiv Detail & Related papers (2024-10-28T05:56:51Z) - GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input.<n>GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak)
arXiv Detail & Related papers (2024-10-11T03:05:06Z) - Learning to Rewrite: Generalized LLM-Generated Text Detection [19.9477991969521]
Large language models (LLMs) present significant risks when used to generate non-factual content and spread disinformation at scale.<n>We introduce Learning2Rewrite, a novel framework for detecting AI-generated text with exceptional generalization to unseen domains.
arXiv Detail & Related papers (2024-08-08T05:53:39Z) - Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews [51.453135368388686]
We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM)
Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level.
arXiv Detail & Related papers (2024-03-11T21:51:39Z) - Do LLMs Implicitly Determine the Suitable Text Difficulty for Users? [29.6000895693808]
We show that large language models can implicitly handle text difficulty between user input and generated text.
Some LLMs can surpass humans in handling text difficulty and the importance of instruction-tuning.
arXiv Detail & Related papers (2024-02-22T11:16:23Z) - Digital Comprehensibility Assessment of Simplified Texts among Persons
with Intellectual Disabilities [2.446971913303003]
We conducted an evaluation study of text comprehensibility including participants with and without intellectual disabilities reading German texts on a tablet computer.
We explored four different approaches to measuring comprehensibility: multiple-choice comprehension questions, perceived difficulty ratings, response time, and reading speed.
For the target group of persons with intellectual disabilities, comprehension questions emerged as the most reliable measure, while analyzing reading speed provided valuable insights into participants' reading behavior.
arXiv Detail & Related papers (2024-02-20T15:37:08Z) - The Earth is Flat? Unveiling Factual Errors in Large Language Models [89.94270049334479]
Large Language Models (LLMs) like ChatGPT are in various applications due to their extensive knowledge from pre-training and fine-tuning.
Despite this, they are prone to generating factual and commonsense errors, raising concerns in critical areas like healthcare, journalism, and education.
We introduce a novel, automatic testing framework, FactChecker, aimed at uncovering factual inaccuracies in LLMs.
arXiv Detail & Related papers (2024-01-01T14:02:27Z) - Text Simplification of Scientific Texts for Non-Expert Readers [3.4761212729163318]
Simplification of scientific abstracts helps non-experts to access the core information.
This is especially relevant for, e.g., cancer patients reading about novel treatment options.
arXiv Detail & Related papers (2023-07-07T13:05:11Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.