Related papers: BookAsSumQA: An Evaluation Framework for Aspect-Based Book Summarization via Question Answering

BookAsSumQA: An Evaluation Framework for Aspect-Based Book Summarization via Question Answering

URL: http://arxiv.org/abs/2511.06183v1
Date: Sun, 09 Nov 2025 01:54:53 GMT
Title: BookAsSumQA: An Evaluation Framework for Aspect-Based Book Summarization via Question Answering
Authors: Ryuhei Miyazato, Ting-Ruen Wei, Xuyang Wu, Hsin-Tai Wu, Kei Harada,
Abstract summary: BookAsSumQA is a QA-based evaluation framework for aspect-based book summarization.<n>Our experiments showed that while LLM-based approaches showed higher accuracy on shorter texts, RAG-based methods become more effective as document length increases.
Score: 2.703301365475554
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Aspect-based summarization aims to generate summaries that highlight specific aspects of a text, enabling more personalized and targeted summaries. However, its application to books remains unexplored due to the difficulty of constructing reference summaries for long text. To address this challenge, we propose BookAsSumQA, a QA-based evaluation framework for aspect-based book summarization. BookAsSumQA automatically generates aspect-specific QA pairs from a narrative knowledge graph to evaluate summary quality based on its question-answering performance. Our experiments using BookAsSumQA revealed that while LLM-based approaches showed higher accuracy on shorter texts, RAG-based methods become more effective as document length increases, making them more efficient and practical for aspect-based book summarization.

Related papers

ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering [54.72902502486611]
ReAG is a Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages.<n>ReAG significantly outperforms prior methods, improving answer accuracy and providing interpretable reasoning grounded in retrieved evidence.
arXiv Detail & Related papers (2025-11-27T19:01:02Z)
CARPAS: Towards Content-Aware Refinement of Provided Aspects for Summarization in Large Language Models [16.41705871316774]
This paper introduces Content-Aware Refinement of Provided Aspects for Summarization (CARPAS)<n>We propose a preliminary subtask to predict the number of relevant aspects, and demonstrate that the predicted number can serve as effective guidance.<n>Our experiments show that the proposed approach significantly improves performance across all datasets.
arXiv Detail & Related papers (2025-10-08T16:16:46Z)
Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization [86.98098988779809]
We propose SummQ, a novel adversarial multi-agent framework for long document summarization.<n>Our approach employs summary generators and reviewers that work collaboratively to create and evaluate comprehensive summaries.<n>We evaluate SummQ on three widely used long document summarization benchmarks.
arXiv Detail & Related papers (2025-09-25T08:36:19Z)
ConSens: Assessing context grounding in open-book question answering [0.0]
Large Language Models (LLMs) have demonstrated considerable success in open-book question answering (QA)<n>A critical challenge in open-book QA is to ensure that model responses are based on the provided context rather than its parametric knowledge.<n>We propose a novel metric that contrasts the perplexity of the model response under two conditions.<n>The resulting score quantifies the extent to which the model's answer relies on the provided context.
arXiv Detail & Related papers (2025-04-30T16:23:15Z)
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing [43.75154489681047]
We propose a novel framework leveraging test-time scaling for Multi-Document Summarization (MDS)<n>Our approach employs prompt ensemble techniques to generate multiple candidate summaries using various prompts, then combines them with an aggregator to produce a refined summary.<n>To evaluate our method effectively, we also introduce two new LLM-based metrics: the Consistency-Aware Preference (CAP) score and LLM Atom-Content-Unit (LLM-ACU) score.
arXiv Detail & Related papers (2025-02-27T23:34:47Z)
Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents [61.41316121093604]
We present InsCoQA, a novel benchmark for evaluating large language models (LLMs) in the context of conversational question answering (CQA) Sourced from extensive, encyclopedia-style instructional content, InsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents. We also propose InsEval, an LLM-assisted evaluator that measures the integrity and accuracy of generated responses and procedural instructions.
arXiv Detail & Related papers (2024-10-01T09:10:00Z)
Beyond Relevant Documents: A Knowledge-Intensive Approach for Query-Focused Summarization using Large Language Models [27.90653125902507]
We propose a knowledge-intensive approach that reframes query-focused summarization as a knowledge-intensive task setup. The retrieval module efficiently retrieves potentially relevant documents from a large-scale knowledge corpus. The summarization controller seamlessly integrates a powerful large language model (LLM)-based summarizer with a carefully tailored prompt.
arXiv Detail & Related papers (2024-08-19T18:54:20Z)
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction [85.26780391682894]
We propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE) FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary. Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation.
arXiv Detail & Related papers (2024-03-04T17:57:18Z)
QontSum: On Contrasting Salient Content for Query-focused Summarization [22.738731393540633]
Query-focused summarization (QFS) is a challenging task in natural language processing that generates summaries to address specific queries. This paper highlights the role of QFS in Grounded Answer Generation (GAR) We propose QontSum, a novel approach for QFS that leverages contrastive learning to help the model attend to the most relevant regions of the input document.
arXiv Detail & Related papers (2023-07-14T19:25:35Z)
SummIt: Iterative Text Summarization via ChatGPT [12.966825834765814]
We propose SummIt, an iterative text summarization framework based on large language models like ChatGPT. Our framework enables the model to refine the generated summary iteratively through self-evaluation and feedback. We also conduct a human evaluation to validate the effectiveness of the iterative refinements and identify a potential issue of over-correction.
arXiv Detail & Related papers (2023-05-24T07:40:06Z)
Abstractive Query Focused Summarization with Query-Free Resources [60.468323530248945]
In this work, we consider the problem of leveraging only generic summarization resources to build an abstractive QFS system. We propose Marge, a Masked ROUGE Regression framework composed of a novel unified representation for summaries and queries. Despite learning from minimal supervision, our system achieves state-of-the-art results in the distantly supervised setting.
arXiv Detail & Related papers (2020-12-29T14:39:35Z)
Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary [65.37544133256499]
We propose a metric to evaluate the content quality of a summary using question-answering (QA) We demonstrate the experimental benefits of QA-based metrics through an analysis of our proposed metric, QAEval.
arXiv Detail & Related papers (2020-10-01T15:33:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.