Related papers: SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts

SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts

URL: http://arxiv.org/abs/2104.08809v1
Date: Sun, 18 Apr 2021 10:42:20 GMT
Title: SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts
Authors: Arie Cattan, Sophie Johnson, Daniel Weld, Ido Dagan, Iz Beltagy, Doug Downey, Tom Hope
Abstract summary: We present a new task of hierarchical CDCR for concepts in scientific papers. The goal is to jointly inferring coreference clusters and hierarchy between them. We create SciCo, an expert-annotated dataset for this task, which is 3X larger than the prominent ECB+ resource.
Score: 28.96683772139377
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Determining coreference of concept mentions across multiple documents is fundamental for natural language understanding. Work on cross-document coreference resolution (CDCR) typically considers mentions of events in the news, which do not often involve abstract technical concepts that are prevalent in science and technology. These complex concepts take diverse or ambiguous forms and have many hierarchical levels of granularity (e.g., tasks and subtasks), posing challenges for CDCR. We present a new task of hierarchical CDCR for concepts in scientific papers, with the goal of jointly inferring coreference clusters and hierarchy between them. We create SciCo, an expert-annotated dataset for this task, which is 3X larger than the prominent ECB+ resource. We find that tackling both coreference and hierarchy at once outperforms disjoint models, which we hope will spur development of joint models for SciCo.

Related papers

DISRetrieval: Harnessing Discourse Structure for Long Document Retrieval [51.89673002051528]
DISRetrieval is a novel hierarchical retrieval framework that leverages linguistic discourse structure to enhance long document understanding.<n>Our studies confirm that discourse structure significantly enhances retrieval effectiveness across different document lengths and query types.
arXiv Detail & Related papers (2025-05-26T14:45:12Z)
Enhancing Abstractive Summarization of Scientific Papers Using Structure Information [6.414732533433283]
We propose a two-stage abstractive summarization framework that leverages automatic recognition of structural functions within scientific papers.<n>In the first stage, we standardize chapter titles from numerous scientific papers and construct a large-scale dataset for structural function recognition.<n>In the second stage, we employ Longformer to capture rich contextual relationships across sections and generating context-aware summaries.
arXiv Detail & Related papers (2025-05-20T10:34:45Z)
Data-driven Coreference-based Ontology Building [48.995395445597225]
Coreference resolution is traditionally used as a component in individual document understanding. We take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations. We release the coreference chains resulting under a creative-commons license, along with the code.
arXiv Detail & Related papers (2024-10-22T14:30:40Z)
Inferring Scientific Cross-Document Coreference and Hierarchy with Definition-Augmented Relational Reasoning [7.086262532457526]
We present a novel method which generates context-dependent definitions of concept mentions by retrieving full-text literature. We further generate relational definitions, which describe how two concept mentions are related or different, and design an efficient re-ranking approach to address the explosion involved in inferring links across papers.
arXiv Detail & Related papers (2024-09-23T15:20:27Z)
Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery [69.91441987063307]
Generalized Category Discovery (GCD) aims to cluster unlabeled data from both known and unknown categories. Current GCD methods rely on only visual cues, which neglect the multi-modality perceptive nature of human cognitive processes in discovering novel visual categories. We propose a two-phase TextGCD framework to accomplish multi-modality GCD by exploiting powerful Visual-Language Models.
arXiv Detail & Related papers (2024-03-12T07:06:50Z)
On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling [29.277151061615434]
We propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo) TraCo constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them. Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding.
arXiv Detail & Related papers (2024-01-25T11:47:58Z)
On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications. FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER. We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z)
MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding [53.03978356918377]
spatial hierarchical relationships between content at different levels of granularity are crucial for document image understanding tasks. Existing methods learn features from either word-level or region-level but fail to consider both simultaneously. We propose MGDoc, a new multi-modal multi-granular pre-training framework that encodes page-level, region-level, and word-level information at the same time.
arXiv Detail & Related papers (2022-11-27T22:47:37Z)
Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z)
A Densely Connected Criss-Cross Attention Network for Document-level Relation Extraction [3.276435438007766]
Document-level relation extraction (RE) aims to identify relations between two entities in a given document. Previous research normally completed reasoning through information propagation on the mention-level or entity-level document-graph. We propose a novel model, called Densely Connected Criss-Cross Attention Network (Dense-CCNet), for document-level RE.
arXiv Detail & Related papers (2022-03-26T01:01:34Z)
CD2CR: Co-reference Resolution Across Documents and Domains [20.30046972135548]
Cross-document co-reference resolution (CDCR) is the task of identifying and linking mentions to entities and concepts across many text documents. We propose a new task and English language dataset for cross-document cross-domain co-reference resolution (CD$2$CR) We show that in this cross-domain, cross-document setting, existing CDCR models do not perform well and we provide a baseline model that outperforms current state-of-the-art CDCR models on CD$2$CR.
arXiv Detail & Related papers (2021-01-29T15:18:30Z)
Generalizing Cross-Document Event Coreference Resolution Across Multiple Corpora [63.429307282665704]
Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents. CDCR aims to benefit downstream multi-document applications, but improvements from applying CDCR have not been shown yet. We make the observation that every CDCR system to date was developed, trained, and tested only on a single respective corpus.
arXiv Detail & Related papers (2020-11-24T17:45:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.