Related papers: Code Book for the Annotation of Diverse Cross-Document Coreference of Entities in News Articles

Code Book for the Annotation of Diverse Cross-Document Coreference of Entities in News Articles

URL: http://arxiv.org/abs/2310.12064v1
Date: Wed, 18 Oct 2023 15:53:45 GMT
Title: Code Book for the Annotation of Diverse Cross-Document Coreference of Entities in News Articles
Authors: Jakob Vogel
Abstract summary: It includes a precise description of how to set up Inception, a respective annotation tool, how to annotate entities in news articles, connect them with diverse coreferential relations, and link them across documents to Wikidata's global knowledge graph. Our main contribution lies in providing a methodology for creating a diverse cross-document coreference corpus which can be applied to the analysis of media bias by word-choice and labelling.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a scheme for annotating coreference across news articles, extending beyond traditional identity relations by also considering near-identity and bridging relations. It includes a precise description of how to set up Inception, a respective annotation tool, how to annotate entities in news articles, connect them with diverse coreferential relations, and link them across documents to Wikidata's global knowledge graph. This multi-layered annotation approach is discussed in the context of the problem of media bias. Our main contribution lies in providing a methodology for creating a diverse cross-document coreference corpus which can be applied to the analysis of media bias by word-choice and labelling.

Related papers

S2 Chunking: A Hybrid Framework for Document Segmentation Through Integrated Spatial and Semantic Analysis [0.0]
Document chunking is a critical task in natural language processing (NLP) This paper introduces a novel hybrid approach that combines layout structure, semantic analysis, and spatial relationships. Experimental results demonstrate that this approach outperforms traditional methods.
arXiv Detail & Related papers (2025-01-08T09:06:29Z)
Data-driven Coreference-based Ontology Building [48.995395445597225]
Coreference resolution is traditionally used as a component in individual document understanding. We take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations. We release the coreference chains resulting under a creative-commons license, along with the code.
arXiv Detail & Related papers (2024-10-22T14:30:40Z)
Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities. Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z)
Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings. First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss. Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z)
Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning [14.77066147494556]
We propose a novel network to extract multi-view semantic concepts from documents and images and align the matching rather than entire concepts. We consistently outperform state-of-the-art methods under two document sources in three standard benchmarks for document-based zero-shot learning.
arXiv Detail & Related papers (2024-07-22T13:15:04Z)
Directed Criteria Citation Recommendation and Ranking Through Link Prediction [0.32885740436059047]
Our model uses transformer-based graph embeddings to encode the meaning of each document, presented as a node within a citation network. We show that the semantic representations that our model generates can outperform other content-based methods in recommendation and ranking tasks.
arXiv Detail & Related papers (2024-03-18T20:47:38Z)
Modeling Endorsement for Multi-Document Abstractive Summarization [10.166639983949887]
A crucial difference between single- and multi-document summarization is how salient content manifests itself in the document(s) In this paper, we model the cross-document endorsement effect and its utilization in multiple document summarization. Our method generates a synopsis from each document, which serves as an endorser to identify salient content from other documents.
arXiv Detail & Related papers (2021-10-15T03:55:42Z)
iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration [63.272359227081836]
iFacetSum integrates interactive summarization together with faceted search. Fine-grained facets are automatically produced based on cross-document coreference pipelines.
arXiv Detail & Related papers (2021-09-23T20:01:11Z)
Assessing the quality of sources in Wikidata across languages: a hybrid approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages. We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata. The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z)
MIND - Mainstream and Independent News Documents Corpus [0.7347989843033033]
This paper characterizes MIND, a new Portuguese corpus comprised of different types of articles collected from online mainstream and alternative media sources. The articles in the corpus are organized into five collections: facts, opinions, entertainment, satires, and conspiracy theories.
arXiv Detail & Related papers (2021-08-13T14:00:12Z)
Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level. We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z)
Document Network Projection in Pretrained Word Embedding Space [7.455546102930911]
We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents into a pretrained word embedding space. We leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph) The document representations can help to solve many information retrieval tasks, such as recommendation, classification and clustering.
arXiv Detail & Related papers (2020-01-16T10:16:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.