Artemis: A Novel Annotation Methodology for Indicative Single Document
Summarization
- URL: http://arxiv.org/abs/2005.02146v2
- Date: Thu, 14 May 2020 02:43:56 GMT
- Title: Artemis: A Novel Annotation Methodology for Indicative Single Document
Summarization
- Authors: Rahul Jha, Keping Bi, Yang Li, Mahdi Pakdaman, Asli Celikyilmaz, Ivan
Zhiboedov, Kieran McDonald
- Abstract summary: Artemis is a novel hierarchical annotation process that produces indicative summaries for documents from multiple domains.
It is more tractable because judges don't need to look at all the sentences in a document when making an importance judgment for one of the sentences.
We present analysis and experimental results over a sample set of 532 annotated documents.
- Score: 27.55699431297619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe Artemis (Annotation methodology for Rich, Tractable, Extractive,
Multi-domain, Indicative Summarization), a novel hierarchical annotation
process that produces indicative summaries for documents from multiple domains.
Current summarization evaluation datasets are single-domain and focused on a
few domains for which naturally occurring summaries can be easily found, such
as news and scientific articles. These are not sufficient for training and
evaluation of summarization models for use in document management and
information retrieval systems, which need to deal with documents from multiple
domains. Compared to other annotation methods such as Relative Utility and
Pyramid, Artemis is more tractable because judges don't need to look at all the
sentences in a document when making an importance judgment for one of the
sentences, while providing similarly rich sentence importance annotations. We
describe the annotation process in detail and compare it with other similar
evaluation systems. We also present analysis and experimental results over a
sample set of 532 annotated documents.
Related papers
- Context-Aware Hierarchical Merging for Long Document Summarization [56.96619074316232]
We propose different approaches to enrich hierarchical merging with context from the source document.
Experimental results on datasets representing legal and narrative domains show that contextual augmentation consistently outperforms zero-shot and hierarchical merging baselines.
arXiv Detail & Related papers (2025-02-03T01:14:31Z) - Subtopic-aware View Sampling and Temporal Aggregation for Long-form Document Matching [34.81690842091582]
Long-form document matching aims to judge the relevance between two documents.
We introduce a new framework to model representative matching signals.
Our learning framework is effective on several document-matching tasks, including news duplication and legal case retrieval.
arXiv Detail & Related papers (2024-12-10T15:06:48Z) - Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Mining both Commonality and Specificity from Multiple Documents for
Multi-Document Summarization [1.4629756274247374]
The multi-document summarization task requires the designed summarizer to generate a short text that covers the important information of original documents.
This paper proposes a multi-document summarization approach based on hierarchical clustering of documents.
arXiv Detail & Related papers (2023-03-05T14:25:05Z) - Open Domain Multi-document Summarization: A Comprehensive Study of Model
Brittleness under Retrieval [42.73076855699184]
Multi-document summarization (MDS) assumes a set of topic-related documents are provided as input.
We study this more challenging setting by formalizing the task and bootstrapping it using existing datasets, retrievers and summarizers.
arXiv Detail & Related papers (2022-12-20T18:41:38Z) - ACM -- Attribute Conditioning for Abstractive Multi Document
Summarization [0.0]
We propose a model that incorporates attribute conditioning modules in order to decouple conflicting information by conditioning for a certain attribute in the output summary.
This approach shows strong gains in ROUGE score over baseline multi document summarization approaches.
arXiv Detail & Related papers (2022-05-09T00:00:14Z) - Document-Level Relation Extraction with Sentences Importance Estimation
and Focusing [52.069206266557266]
Document-level relation extraction (DocRE) aims to determine the relation between two entities from a document of multiple sentences.
We propose a Sentence Estimation and Focusing (SIEF) framework for DocRE, where we design a sentence importance score and a sentence focusing loss.
Experimental results on two domains show that our SIEF not only improves overall performance, but also makes DocRE models more robust.
arXiv Detail & Related papers (2022-04-27T03:20:07Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Modeling Endorsement for Multi-Document Abstractive Summarization [10.166639983949887]
A crucial difference between single- and multi-document summarization is how salient content manifests itself in the document(s)
In this paper, we model the cross-document endorsement effect and its utilization in multiple document summarization.
Our method generates a synopsis from each document, which serves as an endorser to identify salient content from other documents.
arXiv Detail & Related papers (2021-10-15T03:55:42Z) - RetrievalSum: A Retrieval Enhanced Framework for Abstractive
Summarization [25.434558112121778]
We propose a novel retrieval enhanced abstractive summarization framework consisting of a dense Retriever and a Summarizer.
We validate our method on a wide range of summarization datasets across multiple domains and two backbone models: BERT and BART.
Results show that our framework obtains significant improvement by 1.384.66 in ROUGE-1 score when compared with the powerful pre-trained models.
arXiv Detail & Related papers (2021-09-16T12:52:48Z) - WikiAsp: A Dataset for Multi-domain Aspect-based Summarization [69.13865812754058]
We propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization.
Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation.
Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.
arXiv Detail & Related papers (2020-11-16T10:02:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.