Related papers: Artemis: A Novel Annotation Methodology for Indicative Single Document Summarization

Artemis: A Novel Annotation Methodology for Indicative Single Document Summarization

URL: http://arxiv.org/abs/2005.02146v2
Date: Thu, 14 May 2020 02:43:56 GMT
Title: Artemis: A Novel Annotation Methodology for Indicative Single Document Summarization
Authors: Rahul Jha, Keping Bi, Yang Li, Mahdi Pakdaman, Asli Celikyilmaz, Ivan Zhiboedov, Kieran McDonald
Abstract summary: Artemis is a novel hierarchical annotation process that produces indicative summaries for documents from multiple domains. It is more tractable because judges don't need to look at all the sentences in a document when making an importance judgment for one of the sentences. We present analysis and experimental results over a sample set of 532 annotated documents.
Score: 27.55699431297619
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We describe Artemis (Annotation methodology for Rich, Tractable, Extractive, Multi-domain, Indicative Summarization), a novel hierarchical annotation process that produces indicative summaries for documents from multiple domains. Current summarization evaluation datasets are single-domain and focused on a few domains for which naturally occurring summaries can be easily found, such as news and scientific articles. These are not sufficient for training and evaluation of summarization models for use in document management and information retrieval systems, which need to deal with documents from multiple domains. Compared to other annotation methods such as Relative Utility and Pyramid, Artemis is more tractable because judges don't need to look at all the sentences in a document when making an importance judgment for one of the sentences, while providing similarly rich sentence importance annotations. We describe the annotation process in detail and compare it with other similar evaluation systems. We also present analysis and experimental results over a sample set of 532 annotated documents.

Related papers

Context-Aware Hierarchical Merging for Long Document Summarization [56.96619074316232]
We propose different approaches to enrich hierarchical merging with context from the source document. Experimental results on datasets representing legal and narrative domains show that contextual augmentation consistently outperforms zero-shot and hierarchical merging baselines.
arXiv Detail & Related papers (2025-02-03T01:14:31Z)
Subtopic-aware View Sampling and Temporal Aggregation for Long-form Document Matching [34.81690842091582]
Long-form document matching aims to judge the relevance between two documents. We introduce a new framework to model representative matching signals. Our learning framework is effective on several document-matching tasks, including news duplication and legal case retrieval.
arXiv Detail & Related papers (2024-12-10T15:06:48Z)
Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities. Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z)
Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings. First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss. Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z)
Mining both Commonality and Specificity from Multiple Documents for Multi-Document Summarization [1.4629756274247374]
The multi-document summarization task requires the designed summarizer to generate a short text that covers the important information of original documents. This paper proposes a multi-document summarization approach based on hierarchical clustering of documents.
arXiv Detail & Related papers (2023-03-05T14:25:05Z)
Open Domain Multi-document Summarization: A Comprehensive Study of Model Brittleness under Retrieval [42.73076855699184]
Multi-document summarization (MDS) assumes a set of topic-related documents are provided as input. We study this more challenging setting by formalizing the task and bootstrapping it using existing datasets, retrievers and summarizers.
arXiv Detail & Related papers (2022-12-20T18:41:38Z)
ACM -- Attribute Conditioning for Abstractive Multi Document Summarization [0.0]
We propose a model that incorporates attribute conditioning modules in order to decouple conflicting information by conditioning for a certain attribute in the output summary. This approach shows strong gains in ROUGE score over baseline multi document summarization approaches.
arXiv Detail & Related papers (2022-05-09T00:00:14Z)
Document-Level Relation Extraction with Sentences Importance Estimation and Focusing [52.069206266557266]
Document-level relation extraction (DocRE) aims to determine the relation between two entities from a document of multiple sentences. We propose a Sentence Estimation and Focusing (SIEF) framework for DocRE, where we design a sentence importance score and a sentence focusing loss. Experimental results on two domains show that our SIEF not only improves overall performance, but also makes DocRE models more robust.
arXiv Detail & Related papers (2022-04-27T03:20:07Z)
Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding. UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input. An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
Modeling Endorsement for Multi-Document Abstractive Summarization [10.166639983949887]
A crucial difference between single- and multi-document summarization is how salient content manifests itself in the document(s) In this paper, we model the cross-document endorsement effect and its utilization in multiple document summarization. Our method generates a synopsis from each document, which serves as an endorser to identify salient content from other documents.
arXiv Detail & Related papers (2021-10-15T03:55:42Z)
RetrievalSum: A Retrieval Enhanced Framework for Abstractive Summarization [25.434558112121778]
We propose a novel retrieval enhanced abstractive summarization framework consisting of a dense Retriever and a Summarizer. We validate our method on a wide range of summarization datasets across multiple domains and two backbone models: BERT and BART. Results show that our framework obtains significant improvement by 1.384.66 in ROUGE-1 score when compared with the powerful pre-trained models.
arXiv Detail & Related papers (2021-09-16T12:52:48Z)
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization [69.13865812754058]
We propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.
arXiv Detail & Related papers (2020-11-16T10:02:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.