Cross-document Event Coreference Search: Task, Dataset and Modeling
- URL: http://arxiv.org/abs/2210.12654v1
- Date: Sun, 23 Oct 2022 08:21:25 GMT
- Title: Cross-document Event Coreference Search: Task, Dataset and Modeling
- Authors: Alon Eirew, Avi Caciularu, Ido Dagan
- Abstract summary: We propose an appealing, and often more applicable, complementary set up for the task - Cross-document Coreference Search.
To support research on this task, we create a corresponding dataset, which is derived from Wikipedia.
We present a novel model that integrates a powerful coreference scoring scheme into the DPR architecture, yielding improved performance.
- Score: 26.36068336169796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of Cross-document Coreference Resolution has been traditionally
formulated as requiring to identify all coreference links across a given set of
documents. We propose an appealing, and often more applicable, complementary
set up for the task - Cross-document Coreference Search, focusing in this paper
on event coreference. Concretely, given a mention in context of an event of
interest, considered as a query, the task is to find all coreferring mentions
for the query event in a large document collection. To support research on this
task, we create a corresponding dataset, which is derived from Wikipedia while
leveraging annotations in the available Wikipedia Event Coreference dataset
(WEC-Eng). Observing that the coreference search setup is largely analogous to
the setting of Open Domain Question Answering, we adapt the prominent Deep
Passage Retrieval (DPR) model to our setting, as an appealing baseline.
Finally, we present a novel model that integrates a powerful coreference
scoring scheme into the DPR architecture, yielding improved performance.
Related papers
- Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information [33.21818213257603]
Cross-document event coreference resolution models can only compute mention similarity directly or enhance mention representation by extracting event arguments.
We propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents.
We have developed a large-scale Chinese cross-document event coreference dataset to fill this gap.
arXiv Detail & Related papers (2024-06-23T02:54:48Z) - Event GDR: Event-Centric Generative Document Retrieval [37.53593254200252]
We propose Event GDR, an event-centric generative document retrieval model.
We employ events and relations to model the document to guarantee the comprehensiveness and inner-content correlation.
For identifier construction, we map the events to well-defined event taxonomy to construct the identifiers with explicit semantic structure.
arXiv Detail & Related papers (2024-05-11T02:55:11Z) - On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications.
FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER.
We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - CAPSTONE: Curriculum Sampling for Dense Retrieval with Document
Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query.
Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z) - Augmenting Document Representations for Dense Retrieval with
Interpolation and Perturbation [49.940525611640346]
Document Augmentation for dense Retrieval (DAR) framework augments the representations of documents with their Dense Augmentation and perturbations.
We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the labeled and unlabeled documents.
arXiv Detail & Related papers (2022-03-15T09:07:38Z) - Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling.
We use the profile to query the indexed search engine to retrieve candidate entities.
Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z) - Cross-document Event Identity via Dense Annotation [9.163142877146512]
We study the identity of textual events from different documents.
We propose a dense annotation approach for cross-document event coreference.
We present an open-access dataset for cross-document event coreference.
arXiv Detail & Related papers (2021-09-14T03:57:58Z) - WEC: Deriving a Large-scale Cross-document Event Coreference dataset
from Wikipedia [14.324743524196874]
We present Wikipedia Event Coreference (WEC), an efficient methodology for gathering a large-scale dataset for cross-document event coreference from Wikipedia.
We apply this methodology to the English Wikipedia and extract our large-scale WEC-Eng dataset.
We develop an algorithm that adapts components of state-of-the-art models for within-document coreference resolution to the cross-document setting.
arXiv Detail & Related papers (2021-04-11T14:54:35Z) - CSFCube -- A Test Collection of Computer Science Research Articles for
Faceted Query by Example [43.01717754418893]
We introduce the task of faceted Query by Example.
Users can also specify a finer grained aspect in addition to the input query document.
We envision models which are able to retrieve scientific papers analogous to a query scientific paper.
arXiv Detail & Related papers (2021-03-24T01:02:12Z) - Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.