Related papers: THM@SimpleText 2025 -- Task 1.1: Revisiting Text Simplification based on Complex Terms for Non-Experts

THM@SimpleText 2025 -- Task 1.1: Revisiting Text Simplification based on Complex Terms for Non-Experts

URL: http://arxiv.org/abs/2507.04414v1
Date: Sun, 06 Jul 2025 15:05:54 GMT
Title: THM@SimpleText 2025 -- Task 1.1: Revisiting Text Simplification based on Complex Terms for Non-Experts
Authors: Nico Hofmann, Julian Dauenhauer, Nils Ole Dietzler, Idehen Daniel Idahor, Christin Katharina Kreutz,
Abstract summary: SimpleText CLEF Lab focuses on the problem of simplification of scientific text.<n>To tackle this task we investigate the identification of complex terms in sentences which are rephrased using small Gemini and OpenAI large language models for non-expert readers.
Score: 0.23301643766310368
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scientific text is complex as it contains technical terms by definition. Simplifying such text for non-domain experts enhances accessibility of innovation and information. Politicians could be enabled to understand new findings on topics on which they intend to pass a law, or family members of seriously ill patients could read about clinical trials. The SimpleText CLEF Lab focuses on exactly this problem of simplification of scientific text. Task 1.1 of the 2025 edition specifically handles the simplification of complex sentences, so very short texts with little context. To tackle this task we investigate the identification of complex terms in sentences which are rephrased using small Gemini and OpenAI large language models for non-expert readers.

Related papers

Who Gets Cited Most? Benchmarking Long-Context Language Models on Scientific Articles [81.89404347890662]
SciTrek is a novel question-answering benchmark designed to evaluate the long-context reasoning capabilities of large language models (LLMs) using scientific articles.<n>Our analysis reveals systematic shortcomings in models' abilities to perform basic numerical operations and accurately locate specific information in long contexts.
arXiv Detail & Related papers (2025-09-25T11:36:09Z)
Evaluating LLMs for Targeted Concept Simplification for Domain-Specific Texts [53.421616210871704]
Lack of context and unfamiliarity with difficult concepts is a major reason for adult readers' difficulty with domain-specific text. We introduce "targeted concept simplification," a simplification task for rewriting text to help readers comprehend text containing unfamiliar concepts. We benchmark the performance of open-source and commercial LLMs and a simple dictionary baseline on this task.
arXiv Detail & Related papers (2024-10-28T05:56:51Z)
Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses. Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs. To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z)
ARTIST: ARTificial Intelligence for Simplified Text [5.095775294664102]
Text Simplification is a key Natural Language Processing task that aims for reducing the linguistic complexity of a text. Recent advances in Generative Artificial Intelligence (AI) have enabled automatic text simplification both on the lexical and syntactical levels.
arXiv Detail & Related papers (2023-08-25T16:06:06Z)
Text Simplification of Scientific Texts for Non-Expert Readers [3.4761212729163318]
Simplification of scientific abstracts helps non-experts to access the core information. This is especially relevant for, e.g., cancer patients reading about novel treatment options.
arXiv Detail & Related papers (2023-07-07T13:05:11Z)
Multilingual Simplification of Medical Texts [49.469685530201716]
We introduce MultiCochrane, the first sentence-aligned multilingual text simplification dataset for the medical domain in four languages. We evaluate fine-tuned and zero-shot models across these languages, with extensive human assessments and analyses. Although models can now generate viable simplified texts, we identify outstanding challenges that this dataset might be used to address.
arXiv Detail & Related papers (2023-05-21T18:25:07Z)
Elaborative Simplification as Implicit Questions Under Discussion [51.17933943734872]
This paper proposes to view elaborative simplification through the lens of the Question Under Discussion (QUD) framework. We show that explicitly modeling QUD provides essential understanding of elaborative simplification and how the elaborations connect with the rest of the discourse.
arXiv Detail & Related papers (2023-05-17T17:26:16Z)
Syntactic Complexity Identification, Measurement, and Reduction Through Controlled Syntactic Simplification [0.0]
We present a classical syntactic dependency-based approach to split and rephrase a compound and complex sentence into a set of simplified sentences. The paper also introduces an algorithm to identify and measure a sentence's syntactic complexity. This work is accepted and presented in International workshop on Learning with Knowledge Graphs (IWLKG) at WSDM-2023 Conference.
arXiv Detail & Related papers (2023-04-16T13:13:58Z)
Revisiting the Roles of "Text" in Text Games [102.22750109468652]
This paper investigates the roles of text in the face of different reinforcement learning challenges. We propose a simple scheme to extract relevant contextual information into an approximate state hash. Such a lightweight plug-in achieves competitive performance with state-of-the-art text agents.
arXiv Detail & Related papers (2022-10-15T21:52:39Z)
Towards more patient friendly clinical notes through language models and ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling. We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians. Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z)
Explainable Prediction of Text Complexity: The Missing Preliminaries for Text Simplification [13.447565774887215]
Text simplification reduces the language complexity of professional content for accessibility purposes. End-to-end neural network models have been widely adopted to directly generate the simplified version of input text. We show that text simplification can be decomposed into a compact pipeline of tasks to ensure the transparency and explainability of the process.
arXiv Detail & Related papers (2020-07-31T03:33:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.