Related papers: Systematically Exploring Redundancy Reduction in Summarizing Long Documents

Systematically Exploring Redundancy Reduction in Summarizing Long Documents

URL: http://arxiv.org/abs/2012.00052v1
Date: Mon, 30 Nov 2020 19:07:27 GMT
Title: Systematically Exploring Redundancy Reduction in Summarizing Long Documents
Authors: Wen Xiao, Giuseppe Carenini
Abstract summary: We explore and compare different ways to deal with redundancy when summarizing long documents. In a series of experiments, we show that our proposed methods achieve the state-of-the-art with respect to ROUGE scores on two scientific paper datasets.
Score: 6.812554384019158
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Our analysis of large summarization datasets indicates that redundancy is a very serious problem when summarizing long documents. Yet, redundancy reduction has not been thoroughly investigated in neural summarization. In this work, we systematically explore and compare different ways to deal with redundancy when summarizing long documents. Specifically, we organize the existing methods into categories based on when and how the redundancy is considered. Then, in the context of these categories, we propose three additional methods balancing non-redundancy and importance in a general and flexible way. In a series of experiments, we show that our proposed methods achieve the state-of-the-art with respect to ROUGE scores on two scientific paper datasets, Pubmed and arXiv, while reducing redundancy significantly.

Related papers

Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence [56.09494651178128]
Retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG) We show that retrievers often rely on superficial patterns like over-prioritizing document beginnings, shorter documents, repeated entities, and literal matches. We show that these biases have direct consequences for downstream applications like RAG, where retrieval-preferred documents can mislead LLMs.
arXiv Detail & Related papers (2025-03-06T23:23:13Z)
GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction [15.246183329778656]
Document-level relation extraction (DocRE) aims to extract relations between entities from unstructured document text. To overcome these challenges, we propose GEGA, a novel model for DocRE. We evaluate the GEGA model on three widely used benchmark datasets: DocRED, Re-DocRED, and Revisit-DocRED.
arXiv Detail & Related papers (2024-07-31T07:15:33Z)
Unsupervised Multi-document Summarization with Holistic Inference [41.58777650517525]
This paper proposes a new holistic framework for unsupervised multi-document extractive summarization. Subset Representative Index (SRI) balances the importance and diversity of a subset of sentences from the source documents. Our findings suggest that diversity is essential for improving multi-document summary performance.
arXiv Detail & Related papers (2023-09-08T02:56:30Z)
Fine-Grained Distillation for Long Document Retrieval [86.39802110609062]
Long document retrieval aims to fetch query-relevant documents from a large-scale collection. Knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. We propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers.
arXiv Detail & Related papers (2022-12-20T17:00:36Z)
ReSel: N-ary Relation Extraction from Scientific Text and Tables by Learning to Retrieve and Select [53.071352033539526]
We study the problem of extracting N-ary relations from scientific articles. Our proposed method ReSel decomposes this task into a two-stage procedure. Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-10-26T02:28:02Z)
On the Trade-off between Redundancy and Local Coherence in Summarization [20.16107829497668]
We investigate the trade-offs incurred when aiming to control for inter-sentential cohesion and redundancy in extracted summaries. We find that the proposed unsupervised systems manage to extract highly cohesive summaries across varying levels of document redundancy.
arXiv Detail & Related papers (2022-05-20T14:10:28Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation [49.940525611640346]
Document Augmentation for dense Retrieval (DAR) framework augments the representations of documents with their Dense Augmentation and perturbations. We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the labeled and unlabeled documents.
arXiv Detail & Related papers (2022-03-15T09:07:38Z)
On Generating Extended Summaries of Long Documents [16.149617108647707]
We present a new method for generating extended summaries of long papers. Our method exploits hierarchical structure of the documents and incorporates it into an extractive summarization model. Our analysis shows that our multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences.
arXiv Detail & Related papers (2020-12-28T08:10:28Z)
SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization. We convert the original documents to a sentence graph, taking both linguistic and deep representation into account. We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z)
Self-supervised Deep Reconstruction of Mixed Strip-shredded Text Documents [63.41717168981103]
This work extends our previous deep learning method for single-page reconstruction to a more realistic/complex scenario. In our approach, the compatibility evaluation is modeled as a two-class (valid or invalid) pattern recognition problem. The proposed method outperforms the competing ones on complex scenarios, achieving accuracy superior to 90%.
arXiv Detail & Related papers (2020-07-01T21:48:05Z)
A Divide-and-Conquer Approach to the Summarization of Long Documents [4.863209463405628]
We present a novel divide-and-conquer method for the neural summarization of long documents. Our method exploits the discourse structure of the document and uses sentence similarity to split the problem into smaller summarization problems. We demonstrate that this approach paired with different summarization models, including sequence-to-sequence RNNs and Transformers, can lead to improved summarization performance.
arXiv Detail & Related papers (2020-04-13T20:38:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.