Code Book for the Annotation of Diverse Cross-Document Coreference of
Entities in News Articles
- URL: http://arxiv.org/abs/2310.12064v1
- Date: Wed, 18 Oct 2023 15:53:45 GMT
- Title: Code Book for the Annotation of Diverse Cross-Document Coreference of
Entities in News Articles
- Authors: Jakob Vogel
- Abstract summary: It includes a precise description of how to set up Inception, a respective annotation tool, how to annotate entities in news articles, connect them with diverse coreferential relations, and link them across documents to Wikidata's global knowledge graph.
Our main contribution lies in providing a methodology for creating a diverse cross-document coreference corpus which can be applied to the analysis of media bias by word-choice and labelling.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a scheme for annotating coreference across news articles,
extending beyond traditional identity relations by also considering
near-identity and bridging relations. It includes a precise description of how
to set up Inception, a respective annotation tool, how to annotate entities in
news articles, connect them with diverse coreferential relations, and link them
across documents to Wikidata's global knowledge graph. This multi-layered
annotation approach is discussed in the context of the problem of media bias.
Our main contribution lies in providing a methodology for creating a diverse
cross-document coreference corpus which can be applied to the analysis of media
bias by word-choice and labelling.
Related papers
- S2 Chunking: A Hybrid Framework for Document Segmentation Through Integrated Spatial and Semantic Analysis [0.0]
Document chunking is a critical task in natural language processing (NLP)
This paper introduces a novel hybrid approach that combines layout structure, semantic analysis, and spatial relationships.
Experimental results demonstrate that this approach outperforms traditional methods.
arXiv Detail & Related papers (2025-01-08T09:06:29Z) - Data-driven Coreference-based Ontology Building [48.995395445597225]
Coreference resolution is traditionally used as a component in individual document understanding.
We take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations.
We release the coreference chains resulting under a creative-commons license, along with the code.
arXiv Detail & Related papers (2024-10-22T14:30:40Z) - Unified Multimodal Interleaved Document Representation for Retrieval [57.65409208879344]
We propose a method that holistically embeds documents interleaved with multiple modalities.
We merge the representations of segmented passages into one single document representation.
We show that our approach substantially outperforms relevant baselines.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning [14.77066147494556]
We propose a novel network to extract multi-view semantic concepts from documents and images and align the matching rather than entire concepts.
We consistently outperform state-of-the-art methods under two document sources in three standard benchmarks for document-based zero-shot learning.
arXiv Detail & Related papers (2024-07-22T13:15:04Z) - Modeling Endorsement for Multi-Document Abstractive Summarization [10.166639983949887]
A crucial difference between single- and multi-document summarization is how salient content manifests itself in the document(s)
In this paper, we model the cross-document endorsement effect and its utilization in multiple document summarization.
Our method generates a synopsis from each document, which serves as an endorser to identify salient content from other documents.
arXiv Detail & Related papers (2021-10-15T03:55:42Z) - iFacetSum: Coreference-based Interactive Faceted Summarization for
Multi-Document Exploration [63.272359227081836]
iFacetSum integrates interactive summarization together with faceted search.
Fine-grained facets are automatically produced based on cross-document coreference pipelines.
arXiv Detail & Related papers (2021-09-23T20:01:11Z) - Assessing the quality of sources in Wikidata across languages: a hybrid
approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages.
We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata.
The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z) - MIND - Mainstream and Independent News Documents Corpus [0.7347989843033033]
This paper characterizes MIND, a new Portuguese corpus comprised of different types of articles collected from online mainstream and alternative media sources.
The articles in the corpus are organized into five collections: facts, opinions, entertainment, satires, and conspiracy theories.
arXiv Detail & Related papers (2021-08-13T14:00:12Z) - Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level.
We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z) - Document Network Projection in Pretrained Word Embedding Space [7.455546102930911]
We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents into a pretrained word embedding space.
We leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph)
The document representations can help to solve many information retrieval tasks, such as recommendation, classification and clustering.
arXiv Detail & Related papers (2020-01-16T10:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.