SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts
- URL: http://arxiv.org/abs/2104.08809v1
- Date: Sun, 18 Apr 2021 10:42:20 GMT
- Title: SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts
- Authors: Arie Cattan, Sophie Johnson, Daniel Weld, Ido Dagan, Iz Beltagy, Doug
Downey, Tom Hope
- Abstract summary: We present a new task of hierarchical CDCR for concepts in scientific papers.
The goal is to jointly inferring coreference clusters and hierarchy between them.
We create SciCo, an expert-annotated dataset for this task, which is 3X larger than the prominent ECB+ resource.
- Score: 28.96683772139377
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Determining coreference of concept mentions across multiple documents is
fundamental for natural language understanding. Work on cross-document
coreference resolution (CDCR) typically considers mentions of events in the
news, which do not often involve abstract technical concepts that are prevalent
in science and technology. These complex concepts take diverse or ambiguous
forms and have many hierarchical levels of granularity (e.g., tasks and
subtasks), posing challenges for CDCR. We present a new task of hierarchical
CDCR for concepts in scientific papers, with the goal of jointly inferring
coreference clusters and hierarchy between them. We create SciCo, an
expert-annotated dataset for this task, which is 3X larger than the prominent
ECB+ resource. We find that tackling both coreference and hierarchy at once
outperforms disjoint models, which we hope will spur development of joint
models for SciCo.
Related papers
- Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized
Visual Class Discovery [69.91441987063307]
Generalized Category Discovery (GCD) aims to cluster unlabeled data from both known and unknown categories.
Current GCD methods rely on only visual cues, which neglect the multi-modality perceptive nature of human cognitive processes in discovering novel visual categories.
We propose a two-phase TextGCD framework to accomplish multi-modality GCD by exploiting powerful Visual-Language Models.
arXiv Detail & Related papers (2024-03-12T07:06:50Z) - On the Affinity, Rationality, and Diversity of Hierarchical Topic
Modeling [29.277151061615434]
We propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo)
TraCo constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them.
Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding.
arXiv Detail & Related papers (2024-01-25T11:47:58Z) - On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications.
FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER.
We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z) - MGDoc: Pre-training with Multi-granular Hierarchy for Document Image
Understanding [53.03978356918377]
spatial hierarchical relationships between content at different levels of granularity are crucial for document image understanding tasks.
Existing methods learn features from either word-level or region-level but fail to consider both simultaneously.
We propose MGDoc, a new multi-modal multi-granular pre-training framework that encodes page-level, region-level, and word-level information at the same time.
arXiv Detail & Related papers (2022-11-27T22:47:37Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - A Densely Connected Criss-Cross Attention Network for Document-level
Relation Extraction [3.276435438007766]
Document-level relation extraction (RE) aims to identify relations between two entities in a given document.
Previous research normally completed reasoning through information propagation on the mention-level or entity-level document-graph.
We propose a novel model, called Densely Connected Criss-Cross Attention Network (Dense-CCNet), for document-level RE.
arXiv Detail & Related papers (2022-03-26T01:01:34Z) - WEC: Deriving a Large-scale Cross-document Event Coreference dataset
from Wikipedia [14.324743524196874]
We present Wikipedia Event Coreference (WEC), an efficient methodology for gathering a large-scale dataset for cross-document event coreference from Wikipedia.
We apply this methodology to the English Wikipedia and extract our large-scale WEC-Eng dataset.
We develop an algorithm that adapts components of state-of-the-art models for within-document coreference resolution to the cross-document setting.
arXiv Detail & Related papers (2021-04-11T14:54:35Z) - CD2CR: Co-reference Resolution Across Documents and Domains [20.30046972135548]
Cross-document co-reference resolution (CDCR) is the task of identifying and linking mentions to entities and concepts across many text documents.
We propose a new task and English language dataset for cross-document cross-domain co-reference resolution (CD$2$CR)
We show that in this cross-domain, cross-document setting, existing CDCR models do not perform well and we provide a baseline model that outperforms current state-of-the-art CDCR models on CD$2$CR.
arXiv Detail & Related papers (2021-01-29T15:18:30Z) - Generalizing Cross-Document Event Coreference Resolution Across Multiple
Corpora [63.429307282665704]
Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents.
CDCR aims to benefit downstream multi-document applications, but improvements from applying CDCR have not been shown yet.
We make the observation that every CDCR system to date was developed, trained, and tested only on a single respective corpus.
arXiv Detail & Related papers (2020-11-24T17:45:03Z) - Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data
and Methodology [68.8836704199096]
Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents.
With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses.
Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence.
arXiv Detail & Related papers (2020-08-21T22:43:27Z) - Expressiveness and machine processability of Knowledge Organization
Systems (KOS): An analysis of concepts and relations [0.0]
The potential of both the expressiveness and machine processability of each Knowledge Organization System is extensively regulated by its structural rules.
Ontologies explicitly define diverse types of relations, and are by their nature machine-processable.
arXiv Detail & Related papers (2020-03-11T12:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.