SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts
- URL: http://arxiv.org/abs/2104.08809v1
- Date: Sun, 18 Apr 2021 10:42:20 GMT
- Title: SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts
- Authors: Arie Cattan, Sophie Johnson, Daniel Weld, Ido Dagan, Iz Beltagy, Doug
Downey, Tom Hope
- Abstract summary: We present a new task of hierarchical CDCR for concepts in scientific papers.
The goal is to jointly inferring coreference clusters and hierarchy between them.
We create SciCo, an expert-annotated dataset for this task, which is 3X larger than the prominent ECB+ resource.
- Score: 28.96683772139377
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Determining coreference of concept mentions across multiple documents is
fundamental for natural language understanding. Work on cross-document
coreference resolution (CDCR) typically considers mentions of events in the
news, which do not often involve abstract technical concepts that are prevalent
in science and technology. These complex concepts take diverse or ambiguous
forms and have many hierarchical levels of granularity (e.g., tasks and
subtasks), posing challenges for CDCR. We present a new task of hierarchical
CDCR for concepts in scientific papers, with the goal of jointly inferring
coreference clusters and hierarchy between them. We create SciCo, an
expert-annotated dataset for this task, which is 3X larger than the prominent
ECB+ resource. We find that tackling both coreference and hierarchy at once
outperforms disjoint models, which we hope will spur development of joint
models for SciCo.
Related papers
- Data-driven Coreference-based Ontology Building [48.995395445597225]
Coreference resolution is traditionally used as a component in individual document understanding.
We take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations.
We release the coreference chains resulting under a creative-commons license, along with the code.
arXiv Detail & Related papers (2024-10-22T14:30:40Z) - Inferring Scientific Cross-Document Coreference and Hierarchy with Definition-Augmented Relational Reasoning [7.086262532457526]
We present a novel method which generates context-dependent definitions of concept mentions by retrieving full-text literature.
We further generate relational definitions, which describe how two concept mentions are related or different, and design an efficient re-ranking approach to address the explosion involved in inferring links across papers.
arXiv Detail & Related papers (2024-09-23T15:20:27Z) - Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized
Visual Class Discovery [69.91441987063307]
Generalized Category Discovery (GCD) aims to cluster unlabeled data from both known and unknown categories.
Current GCD methods rely on only visual cues, which neglect the multi-modality perceptive nature of human cognitive processes in discovering novel visual categories.
We propose a two-phase TextGCD framework to accomplish multi-modality GCD by exploiting powerful Visual-Language Models.
arXiv Detail & Related papers (2024-03-12T07:06:50Z) - On the Affinity, Rationality, and Diversity of Hierarchical Topic
Modeling [29.277151061615434]
We propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo)
TraCo constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them.
Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding.
arXiv Detail & Related papers (2024-01-25T11:47:58Z) - On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications.
FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER.
We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z) - MGDoc: Pre-training with Multi-granular Hierarchy for Document Image
Understanding [53.03978356918377]
spatial hierarchical relationships between content at different levels of granularity are crucial for document image understanding tasks.
Existing methods learn features from either word-level or region-level but fail to consider both simultaneously.
We propose MGDoc, a new multi-modal multi-granular pre-training framework that encodes page-level, region-level, and word-level information at the same time.
arXiv Detail & Related papers (2022-11-27T22:47:37Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - A Densely Connected Criss-Cross Attention Network for Document-level
Relation Extraction [3.276435438007766]
Document-level relation extraction (RE) aims to identify relations between two entities in a given document.
Previous research normally completed reasoning through information propagation on the mention-level or entity-level document-graph.
We propose a novel model, called Densely Connected Criss-Cross Attention Network (Dense-CCNet), for document-level RE.
arXiv Detail & Related papers (2022-03-26T01:01:34Z) - CD2CR: Co-reference Resolution Across Documents and Domains [20.30046972135548]
Cross-document co-reference resolution (CDCR) is the task of identifying and linking mentions to entities and concepts across many text documents.
We propose a new task and English language dataset for cross-document cross-domain co-reference resolution (CD$2$CR)
We show that in this cross-domain, cross-document setting, existing CDCR models do not perform well and we provide a baseline model that outperforms current state-of-the-art CDCR models on CD$2$CR.
arXiv Detail & Related papers (2021-01-29T15:18:30Z) - Generalizing Cross-Document Event Coreference Resolution Across Multiple
Corpora [63.429307282665704]
Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents.
CDCR aims to benefit downstream multi-document applications, but improvements from applying CDCR have not been shown yet.
We make the observation that every CDCR system to date was developed, trained, and tested only on a single respective corpus.
arXiv Detail & Related papers (2020-11-24T17:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.