Enhancing Long Document Long Form Summarisation with Self-Planning
- URL: http://arxiv.org/abs/2512.17179v1
- Date: Fri, 19 Dec 2025 02:37:30 GMT
- Title: Enhancing Long Document Long Form Summarisation with Self-Planning
- Authors: Xiaotang Du, Rohit Saxena, Laura Perez-Beltrachini, Pasquale Minervini, Ivan Titov,
- Abstract summary: We introduce a novel approach for long context summarisation, highlight-guided generation.<n>Our framework applies self-planning methods to identify important content and then generates a summary conditioned on the plan.
- Score: 29.76306977276126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel approach for long context summarisation, highlight-guided generation, that leverages sentence-level information as a content plan to improve the traceability and faithfulness of generated summaries. Our framework applies self-planning methods to identify important content and then generates a summary conditioned on the plan. We explore both an end-to-end and two-stage variants of the approach, finding that the two-stage pipeline performs better on long and information-dense documents. Experiments on long-form summarisation datasets demonstrate that our method consistently improves factual consistency while preserving relevance and overall quality. On GovReport, our best approach has improved ROUGE-L by 4.1 points and achieves about 35% gains in SummaC scores. Qualitative analysis shows that highlight-guided summarisation helps preserve important details, leading to more accurate and insightful summaries across domains.
Related papers
- Introducing Spotlight: A Novel Approach for Generating Captivating Key Information from Documents [25.75158276797885]
We introduce Spotlight, a novel paradigm for information extraction that produces concise, engaging narratives by highlighting the most compelling aspects of a document.<n>Our comprehensive evaluation demonstrates that the resulting model not only identifies key elements with precision but also enhances readability and boosts the engagement value of the original document.
arXiv Detail & Related papers (2025-09-13T18:18:37Z) - Topic-Guided Reinforcement Learning with LLMs for Enhancing Multi-Document Summarization [49.61589046694085]
We propose a topic-guided reinforcement learning approach to improve content selection in Multi-Document Summarization.<n>We first show that explicitly prompting models with topic labels enhances the informativeness of the generated summaries.
arXiv Detail & Related papers (2025-09-11T21:01:54Z) - UMB@PerAnsSumm 2025: Enhancing Perspective-Aware Summarization with Prompt Optimization and Supervised Fine-Tuning [8.095763327154335]
We present our approach to the PerAnsSumm Shared Task, which involves perspective span identification and perspective-aware summarization.<n>For span identification, we adopt ensemble learning that integrates three transformer models through averaging to exploit individual model strengths.<n>For summarization, we design a suite of Chain-of-Thought (CoT) prompting strategies that incorporate keyphrases and guide information to structure summary generation into manageable steps.
arXiv Detail & Related papers (2025-03-14T06:29:51Z) - Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content.
Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning.
Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z) - Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization [48.57273563299046]
We propose the task of Stepwise Summarization, which aims to generate a new appended summary each time a new document is proposed.
The appended summary should not only summarize the newly added content but also be coherent with the previous summary.
We show that SSG achieves state-of-the-art performance in terms of both automatic metrics and human evaluations.
arXiv Detail & Related papers (2024-06-08T05:37:26Z) - SummIt: Iterative Text Summarization via ChatGPT [12.966825834765814]
We propose SummIt, an iterative text summarization framework based on large language models like ChatGPT.
Our framework enables the model to refine the generated summary iteratively through self-evaluation and feedback.
We also conduct a human evaluation to validate the effectiveness of the iterative refinements and identify a potential issue of over-correction.
arXiv Detail & Related papers (2023-05-24T07:40:06Z) - SQuALITY: Building a Long-Document Summarization Dataset the Hard Way [31.832673451018543]
We hire highly-qualified contractors to read stories and write original summaries from scratch.
To amortize reading time, we collect five summaries per document, with the first giving an overview and the subsequent four addressing specific questions.
Experiments with state-of-the-art summarization systems show that our dataset is challenging and that existing automatic evaluation metrics are weak indicators of quality.
arXiv Detail & Related papers (2022-05-23T17:02:07Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - PoBRL: Optimizing Multi-Document Summarization by Blending Reinforcement
Learning Policies [68.8204255655161]
We propose a reinforcement learning based framework PoBRL for solving multi-document summarization.
Our strategy decouples this multi-objective optimization into different subproblems that can be solved individually by reinforcement learning.
Our empirical analysis shows state-of-the-art performance on several multi-document datasets.
arXiv Detail & Related papers (2021-05-18T02:55:42Z) - Better Highlighting: Creating Sub-Sentence Summary Highlights [40.46639471959677]
We present a new method to produce self-contained highlights that are understandable on their own to avoid confusion.
Our method combines determinantal point processes and deep contextualized representations to identify an optimal set of sub-sentence segments.
To demonstrate the flexibility and modeling power of our method, we conduct extensive experiments on summarization datasets.
arXiv Detail & Related papers (2020-10-20T18:57:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.