Related papers: Free-text Rationale Generation under Readability Level Control

Free-text Rationale Generation under Readability Level Control

URL: http://arxiv.org/abs/2407.01384v1
Date: Mon, 1 Jul 2024 15:34:17 GMT
Title: Free-text Rationale Generation under Readability Level Control
Authors: Yi-Sheng Hsu, Nils Feldhus, Sherzod Hakimov,
Abstract summary: We investigate how large language models (LLMs) perform the task of natural language explanation (NLE) under the effects of readability level control. We find that explanations are adaptable to such instruction, but the requested readability is often misaligned with the measured text complexity.
Score: 6.338124510580766
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Free-text rationales justify model decisions in natural language and thus become likable and accessible among approaches to explanation across many tasks. However, their effectiveness can be hindered by misinterpretation and hallucination. As a perturbation test, we investigate how large language models (LLMs) perform the task of natural language explanation (NLE) under the effects of readability level control, i.e., being prompted for a rationale targeting a specific expertise level, such as sixth grade or college. We find that explanations are adaptable to such instruction, but the requested readability is often misaligned with the measured text complexity according to traditional readability metrics. Furthermore, the quality assessment shows that LLMs' ratings of rationales across text complexity exhibit a similar pattern of preference as observed in natural language generation (NLG). Finally, our human evaluation suggests a generally satisfactory impression on rationales at all readability levels, with high-school-level readability being most commonly perceived and favored.

Related papers

AGENT-X: Adaptive Guideline-based Expert Network for Threshold-free AI-generated teXt detection [44.66668435489055]
AGENT-X is a zero-shot multi-agent framework for AI-generated text detection.<n>We organize detection guidelines into semantic, stylistic, and structural dimensions, each independently evaluated by specialized linguistic agents.<n>A meta agent integrates these assessments through confidence-aware aggregation, enabling threshold-free, interpretable classification.<n>Experiments on diverse datasets demonstrate that AGENT-X substantially surpasses state-of-the-art supervised and zero-shot approaches in accuracy, interpretability, and generalization.
arXiv Detail & Related papers (2025-05-21T08:39:18Z)
Beyond One-Size-Fits-All Summarization: Customizing Summaries for Diverse Users [1.3812010983144802]
Controlling readability of textual data is an important element for creating summaries for different audiences. We create our own custom dataset and train a model with our custom architecture. Our method ensures that readability levels are effectively controlled while maintaining accuracy and coherence.
arXiv Detail & Related papers (2025-03-10T19:08:36Z)
Eye Tracking Based Cognitive Evaluation of Automatic Readability Assessment Measures [1.2062053320259833]
We propose an eye tracking-based cognitive framework which taps into a key aspect of readability: reading ease.<n>We use this framework for evaluating a broad range of prominent readability measures, including two systems widely used in education.<n>Our analyses suggest that existing readability measures are poor predictors of reading facilitation and reading ease, outperformed by word properties commonly used in psycholinguistics.
arXiv Detail & Related papers (2025-02-16T14:51:44Z)
Analysing Zero-Shot Readability-Controlled Sentence Simplification [54.09069745799918]
We investigate how different types of contextual information affect a model's ability to generate sentences with the desired readability. Results show that all tested models struggle to simplify sentences due to models' limitations and characteristics of the source sentences. Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS.
arXiv Detail & Related papers (2024-09-30T12:36:25Z)
Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales. We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z)
Generating Summaries with Controllable Readability Levels [67.34087272813821]
Several factors affect the readability level, such as the complexity of the text, its subject matter, and the reader's background knowledge. Current text generation approaches lack refined control, resulting in texts that are not customized to readers' proficiency levels. We develop three text generation techniques for controlling readability: instruction-based readability control, reinforcement learning to minimize the gap between requested and observed readability, and a decoding approach that uses look-ahead to estimate the readability of upcoming decoding steps.
arXiv Detail & Related papers (2023-10-16T17:46:26Z)
More Than Words: Towards Better Quality Interpretations of Text Classifiers [16.66535643383862]
We show that token-based interpretability, while being a convenient first choice given the input interfaces of the ML models, is not the most effective one in all situations. We show that higher-level feature attributions offer several advantages: 1) they are more robust as measured by the randomization tests, 2) they lead to lower variability when using approximation-based methods like SHAP, and 3) they are more intelligible to humans in situations where the linguistic coherence resides at a higher level.
arXiv Detail & Related papers (2021-12-23T10:18:50Z)
Plot-guided Adversarial Example Construction for Evaluating Open-domain Story Generation [23.646133241521614]
Learnable evaluation metrics have promised more accurate assessments by having higher correlations with human judgments. Previous works relied on textitheuristically manipulated plausible examples to mimic possible system drawbacks. We propose to tackle these issues by generating a more comprehensive set of implausible stories using em plots, which are structured representations of controllable factors used to generate stories.
arXiv Detail & Related papers (2021-04-12T20:19:24Z)
Lexically-constrained Text Generation through Commonsense Knowledge Extraction and Injection [62.071938098215085]
We focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts. We propose strategies for enhancing the semantic correctness of the generated text.
arXiv Detail & Related papers (2020-12-19T23:23:40Z)
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale [52.663117551150954]
A few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation. This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them. In this paper, we urge the community for more careful consideration of how they automatically evaluate their models.
arXiv Detail & Related papers (2020-10-26T13:57:20Z)
Measuring Association Between Labels and Free-Text Rationales [60.58672852655487]
In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance. We demonstrate that pipelines, existing models for faithful extractive rationalization on information-extraction style tasks, do not extend as reliably to "reasoning" tasks requiring free-text rationales. We turn to models that jointly predict and rationalize, a class of widely used high-performance models for free-text rationalization whose faithfulness is not yet established.
arXiv Detail & Related papers (2020-10-24T03:40:56Z)
A Controllable Model of Grounded Response Generation [122.7121624884747]
Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process. We propose a framework that we call controllable grounded response generation (CGRG) We show that using this framework, a transformer based model with a novel inductive attention mechanism, trained on a conversation-like Reddit dataset, outperforms strong generation baselines.
arXiv Detail & Related papers (2020-05-01T21:22:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.