Related papers: ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs

ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs

URL: http://arxiv.org/abs/2505.23654v1
Date: Thu, 29 May 2025 17:04:02 GMT
Title: ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs
Authors: Mohamed Elaraby, Diane Litman,
Abstract summary: We focus on a specific form of structure: argument roles, which are crucial for summarizing documents in high-stakes domains such as law.<n>We introduce Argument Representation Coverage (ARC), a framework for measuring how well LLM-generated summaries capture salient arguments.<n>Our results show that while LLMs cover salient argument roles to some extent, critical information is often omitted in generated summaries.
Score: 2.7828644351225087
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Integrating structured information has long improved the quality of abstractive summarization, particularly in retaining salient content. In this work, we focus on a specific form of structure: argument roles, which are crucial for summarizing documents in high-stakes domains such as law. We investigate whether instruction-tuned large language models (LLMs) adequately preserve this information. To this end, we introduce Argument Representation Coverage (ARC), a framework for measuring how well LLM-generated summaries capture salient arguments. Using ARC, we analyze summaries produced by three open-weight LLMs in two domains where argument roles are central: long legal opinions and scientific articles. Our results show that while LLMs cover salient argument roles to some extent, critical information is often omitted in generated summaries, particularly when arguments are sparsely distributed throughout the input. Further, we use ARC to uncover behavioral patterns -- specifically, how the positional bias of LLM context windows and role-specific preferences impact the coverage of key arguments in generated summaries, emphasizing the need for more argument-aware summarization strategies.

Related papers

MArgE: Meshing Argumentative Evidence from Multiple Large Language Models for Justifiable Claim Verification [12.449402503089164]
We introduce MArgE, a novel framework to provide formal structure to the evidence from each large language model.<n>We show experimentally that MArgE can significantly outperform single LLMs.
arXiv Detail & Related papers (2025-08-04T16:40:02Z)
Large Language Models in Argument Mining: A Survey [15.041650203089057]
Argument Mining (AM) focuses on extracting argumentative structures from text.<n>The advent of Large Language Models (LLMs) has profoundly transformed AM, enabling advanced in-context learning.<n>This survey systematically synthesizes recent advancements in LLM-driven AM.
arXiv Detail & Related papers (2025-06-19T15:12:58Z)
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks.<n>We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z)
Unstructured Evidence Attribution for Long Context Query Focused Summarization [46.713307974729844]
Large language models (LLMs) are capable of generating coherent summaries from very long contexts given a user query.<n>We show how existing systems struggle to generate and properly cite unstructured evidence from their context.
arXiv Detail & Related papers (2025-02-20T09:57:42Z)
Context-Aware Hierarchical Merging for Long Document Summarization [56.96619074316232]
We propose different approaches to enrich hierarchical merging with context from the source document.<n> Experimental results on datasets representing legal and narrative domains show that contextual augmentation consistently outperforms zero-shot and hierarchical merging baselines.
arXiv Detail & Related papers (2025-02-03T01:14:31Z)
Long Context vs. RAG for LLMs: An Evaluation and Revisits [41.27137478456755]
This paper revisits recent studies on this topic, highlighting their key insights and discrepancies.<n>We show that LC generally outperforms RAG in question-answering benchmarks, especially for Wikipedia-based questions.<n>We also provide an in-depth discussion on this topic, highlighting the overlooked importance of context relevance in existing studies.
arXiv Detail & Related papers (2024-12-27T14:34:37Z)
CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation [18.39379838806384]
We propose a novel critique-suggestion-guided automatic Prompt Optimization (CriSPO) approach.<n>CriSPO introduces a critique-suggestion module as its core component.<n>This module spontaneously discovers aspects, and compares generated reference texts across these aspects, providing actionable suggestions for prompt modification.<n>To further improve CriSPO with multi-metric optimization, we introduce an Automatic Suffix Tuning (AST) extension to enhance the performance of task prompts across multiple metrics.
arXiv Detail & Related papers (2024-10-03T17:57:01Z)
Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.<n>We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z)
Analyzing the Role of Semantic Representations in the Era of Large Language Models [104.18157036880287]
We investigate the role of semantic representations in the era of large language models (LLMs) We propose an AMR-driven chain-of-thought prompting method, which we call AMRCoT. We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions.
arXiv Detail & Related papers (2024-05-02T17:32:59Z)
Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs) Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy. At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z)
ULTRA: Unleash LLMs' Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Self-Refinement [6.035020544588768]
Event argument extraction (EAE) is the task of identifying role-specific text spans (i.e., arguments) for a given event.<n>We propose a hierarchical framework that extracts event arguments more cost-effectively.<n>We introduce LEAFER to address the challenge LLMs face in locating the exact boundary of an argument.
arXiv Detail & Related papers (2024-01-24T04:13:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.