Progressive Document-level Text Simplification via Large Language Models
- URL: http://arxiv.org/abs/2501.03857v1
- Date: Tue, 07 Jan 2025 15:14:37 GMT
- Title: Progressive Document-level Text Simplification via Large Language Models
- Authors: Dengzhao Fang, Jipeng Qiang, Yi Zhu, Yunhao Yuan, Wei Li, Yan Liu,
- Abstract summary: Long document-level simplification (DS) is still relatively unexplored.
We propose a progressive simplification method (ProgDS) by hierarchically decomposing the task.
- Score: 19.57555397986868
- License:
- Abstract: Research on text simplification has primarily focused on lexical and sentence-level changes. Long document-level simplification (DS) is still relatively unexplored. Large Language Models (LLMs), like ChatGPT, have excelled in many natural language processing tasks. However, their performance on DS tasks is unsatisfactory, as they often treat DS as merely document summarization. For the DS task, the generated long sequences not only must maintain consistency with the original document throughout, but complete moderate simplification operations encompassing discourses, sentences, and word-level simplifications. Human editors employ a hierarchical complexity simplification strategy to simplify documents. This study delves into simulating this strategy through the utilization of a multi-stage collaboration using LLMs. We propose a progressive simplification method (ProgDS) by hierarchically decomposing the task, including the discourse-level, topic-level, and lexical-level simplification. Experimental results demonstrate that ProgDS significantly outperforms existing smaller models or direct prompting with LLMs, advancing the state-of-the-art in the document simplification task.
Related papers
- Redefining Simplicity: Benchmarking Large Language Models from Lexical to Document Simplification [21.727596753351072]
Text simplification (TS) refers to the process of reducing the complexity of a text while retaining its original meaning and key information.
Existing work only shows that large language models (LLMs) have outperformed supervised non-LLM-based methods on sentence simplification.
arXiv Detail & Related papers (2025-02-12T10:38:22Z) - Enhancing LLM Character-Level Manipulation via Divide and Conquer [108.6908427615402]
Large Language Models (LLMs) have demonstrated strong generalization capabilities across a wide range of natural language processing (NLP) tasks.
They exhibit notable weaknesses in character-level string manipulation, struggling with fundamental operations such as character deletion, insertion, and substitution.
We propose Character-Level Manipulation via Divide and Conquer, a novel approach designed to bridge the gap between token-level processing and character-level manipulation.
arXiv Detail & Related papers (2025-02-12T07:37:39Z) - Simple is not Enough: Document-level Text Simplification using Readability and Coherence [20.613410797137036]
We present the SimDoc system, a simplification model considering simplicity, readability, and discourse aspects, such as coherence.
We include multiple objectives during training, considering simplicity, readability, and coherence altogether.
We present a comparative analysis in which we evaluate our proposed models in a zero-shot, few-shot, and fine-tuning setting using document-level TS corpora.
arXiv Detail & Related papers (2024-12-24T19:05:21Z) - Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT [0.0]
This paper investigates if Large Language Models (LLMs) can be applied for editing structured and semi-structured documents with minimal effort.
ChatGPT demonstrates a strong ability to recognize and process the structure of annotated documents.
arXiv Detail & Related papers (2024-09-12T03:41:39Z) - ADaPT: As-Needed Decomposition and Planning with Language Models [131.063805299796]
We introduce As-Needed Decomposition and Planning for complex Tasks (ADaPT)
ADaPT explicitly plans and decomposes complex sub-tasks as-needed, when the Large Language Models is unable to execute them.
Our results demonstrate that ADaPT substantially outperforms established strong baselines.
arXiv Detail & Related papers (2023-11-08T17:59:15Z) - SeqXGPT: Sentence-Level AI-Generated Text Detection [62.3792779440284]
We introduce a sentence-level detection challenge by synthesizing documents polished with large language models (LLMs)
We then propose textbfSequence textbfX (Check) textbfGPT, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection.
arXiv Detail & Related papers (2023-10-13T07:18:53Z) - PEARL: Prompting Large Language Models to Plan and Execute Actions Over
Long Documents [78.27865456183397]
We propose PEARL, a prompting framework to improve reasoning over long documents.
Each stage of PEARL is implemented via zero-shot or few-shot prompting with minimal human input.
We evaluate PEARL on a challenging subset of the QuALITY dataset, which contains questions that require complex reasoning over long narrative texts.
arXiv Detail & Related papers (2023-05-23T23:06:04Z) - Context-Aware Document Simplification [3.2880869992413237]
We explore systems that use document context within the simplification process itself.
We achieve state-of-the-art performance on the document simplification task, even when not relying on plan-guidance.
arXiv Detail & Related papers (2023-05-10T16:06:36Z) - Sentence Simplification via Large Language Models [15.07021692249856]
Sentence Simplification aims to rephrase complex sentences into simpler sentences while retaining original meaning.
Large Language models (LLMs) have demonstrated the ability to perform a variety of natural language processing tasks.
arXiv Detail & Related papers (2023-02-23T12:11:58Z) - Decomposed Prompting: A Modular Approach for Solving Complex Tasks [55.42850359286304]
We propose Decomposed Prompting to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks.
This modular structure allows each prompt to be optimized for its specific sub-task.
We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting.
arXiv Detail & Related papers (2022-10-05T17:28:20Z) - Document-Level Text Simplification: Dataset, Criteria and Baseline [75.58761130635824]
We define and investigate a new task of document-level text simplification.
Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia.
We propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task.
arXiv Detail & Related papers (2021-10-11T08:15:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.