Related papers: Profiling German Text Simplification with Interpretable Model-Fingerprints

Profiling German Text Simplification with Interpretable Model-Fingerprints

URL: http://arxiv.org/abs/2601.13050v1
Date: Mon, 19 Jan 2026 13:39:59 GMT
Title: Profiling German Text Simplification with Interpretable Model-Fingerprints
Authors: Lars Klöser, Mika Beele, Bodo Kraft,
Abstract summary: This paper introduces the Simplification Profiler, a diagnostic toolkit that generates a multidimensional, interpretable fingerprint of simplified texts.<n>Our complete feature set achieves classification F1-scores up to 71.9 %, improving upon simple baselines by over 48 percentage points.
Score: 0.10705399532413612
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: While Large Language Models (LLMs) produce highly nuanced text simplifications, developers currently lack tools for a holistic, efficient, and reproducible diagnosis of their behavior. This paper introduces the Simplification Profiler, a diagnostic toolkit that generates a multidimensional, interpretable fingerprint of simplified texts. Multiple aggregated simplifications of a model result in a model's fingerprint. This novel evaluation paradigm is particularly vital for languages, where the data scarcity problem is magnified when creating flexible models for diverse target groups rather than a single, fixed simplification style. We propose that measuring a model's unique behavioral signature is more relevant in this context as an alternative to correlating metrics with human preferences. We operationalize this with a practical meta-evaluation of our fingerprints' descriptive power, which bypasses the need for large, human-rated datasets. This test measures if a simple linear classifier can reliably identify various model configurations by their created simplifications, confirming that our metrics are sensitive to a model's specific characteristics. The Profiler can distinguish high-level behavioral variations between prompting strategies and fine-grained changes from prompt engineering, including few-shot examples. Our complete feature set achieves classification F1-scores up to 71.9 %, improving upon simple baselines by over 48 percentage points. The Simplification Profiler thus offers developers a granular, actionable analysis to build more effective and truly adaptive text simplification systems.

Related papers

LLMStructBench: Benchmarking Large Language Model Structured Data Extraction [1.338174941551702]
We present a novel benchmark for evaluating Large Language Models (LLMs)<n>Our open dataset comprises diverse, manually verified parsing scenarios of varying complexity.<n>We show that choosing the right prompting strategy is more important than standard attributes such as model size.
arXiv Detail & Related papers (2026-02-16T13:37:58Z)
Simplify-This: A Comparative Analysis of Prompt-Based and Fine-Tuned LLMs [0.0]
Large language models (LLMs) enable strong text generation, and in general there is a practical tradeoff between fine-tuning and prompt engineering.<n>We introduce Simplify-This, a comparative study evaluating both paradigms for text simplification with encoder-decoder LLMs.<n>Fine-tuned models consistently deliver stronger structural simplification, whereas prompting often attains higher semantic similarity scores yet tends to copy inputs.
arXiv Detail & Related papers (2026-01-09T13:46:52Z)
When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection [64.23509202768945]
We introduce dataset, the first benchmark for evaluating detector robustness in personalized settings.<n>Our experimental results demonstrate large performance gaps across detectors in personalized settings.<n>We propose method, a simple and reliable way to predict detector performance changes in personalized settings.
arXiv Detail & Related papers (2025-10-14T13:10:23Z)
Measuring and Analyzing Intelligence via Contextual Uncertainty in Large Language Models using Information-Theoretic Metrics [0.0]
We propose a task-agnostic method that builds a quantitative Cognitive Profile for any model.<n>The profile is built around the Entropy Decay Curve-a plot of a model's normalised predictive uncertainty as context length grows.<n>We also propose the Information Gain Span (IGS) as a single index that summarises the desirability of a decay pattern.
arXiv Detail & Related papers (2025-07-21T20:14:25Z)
Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents. Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z)
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting [68.19544657508509]
Large language models (LLMs) are adopted as a fundamental component of language technologies. We find that several widely used open-source LLMs are extremely sensitive to subtle changes in prompt format in few-shot settings. We propose an algorithm that rapidly evaluates a sampled set of plausible prompt formats for a given task, and reports the interval of expected performance without accessing model weights.
arXiv Detail & Related papers (2023-10-17T15:03:30Z)
Transformer Memory as a Differentiable Search Index [102.41278496436948]
We introduce the Differentiable Search Index (DSI), a new paradigm that learns a text-to-text model that maps string queries directly to relevant docids. We study variations in how documents and their identifiers are represented, variations in training procedures, and the interplay between models and corpus sizes.
arXiv Detail & Related papers (2022-02-14T19:12:43Z)
Document-Level Text Simplification: Dataset, Criteria and Baseline [75.58761130635824]
We define and investigate a new task of document-level text simplification. Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia. We propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task.
arXiv Detail & Related papers (2021-10-11T08:15:31Z)
SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation. Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.