Related papers: Extracting Fine-Grained Knowledge Graphs of Scientific Claims: Dataset and Transformer-Based Results

Extracting Fine-Grained Knowledge Graphs of Scientific Claims: Dataset and Transformer-Based Results

URL: http://arxiv.org/abs/2109.10453v1
Date: Tue, 21 Sep 2021 22:54:09 GMT
Title: Extracting Fine-Grained Knowledge Graphs of Scientific Claims: Dataset and Transformer-Based Results
Authors: Ian H. Magnusson and Scott E. Friedman
Abstract summary: We build SciClaim, a dataset of scientific claims drawn from Social and Behavior Science (SBS), PubMed, and CORD-19 papers. Our novel graph annotation schema incorporates not only coarse-grained entity spans as nodes and relations as edges between them, but also fine-grained attributes that modify entities and their relations. By including more label types and more than twice the label density of previous datasets, SciClaim captures causal, comparative, predictive, statistical, and proportional associations over experimental variables along with their qualifications, subtypes, and evidence.
Score: 0.5710971447109948
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent transformer-based approaches demonstrate promising results on relational scientific information extraction. Existing datasets focus on high-level description of how research is carried out. Instead we focus on the subtleties of how experimental associations are presented by building SciClaim, a dataset of scientific claims drawn from Social and Behavior Science (SBS), PubMed, and CORD-19 papers. Our novel graph annotation schema incorporates not only coarse-grained entity spans as nodes and relations as edges between them, but also fine-grained attributes that modify entities and their relations, for a total of 12,738 labels in the corpus. By including more label types and more than twice the label density of previous datasets, SciClaim captures causal, comparative, predictive, statistical, and proportional associations over experimental variables along with their qualifications, subtypes, and evidence. We extend work in transformer-based joint entity and relation extraction to effectively infer our schema, showing the promise of fine-grained knowledge graphs in scientific claims and beyond.

Related papers

Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets [3.9244082434642555]
AffilKG is a collection of six datasets that are the first to pair complete book scans with large, labeled knowledge graphs.<n>Each dataset features affiliation graphs, which are simple KGs that capture Member relationships between Person and Organization entities.
arXiv Detail & Related papers (2025-05-16T02:24:32Z)
Causal Discovery from Data Assisted by Large Language Models [50.193740129296245]
It is essential to integrate experimental data with prior domain knowledge for knowledge driven discovery. Here we demonstrate this approach by combining high-resolution scanning transmission electron microscopy (STEM) data with insights derived from large language models (LLMs) By fine-tuning ChatGPT on domain-specific literature, we construct adjacency matrices for Directed Acyclic Graphs (DAGs) that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3 (SmBFO)
arXiv Detail & Related papers (2025-03-18T02:14:49Z)
Multimodal Contrastive Representation Learning in Augmented Biomedical Knowledge Graphs [2.006175707670159]
PrimeKG++ is an enriched knowledge graph incorporating multimodal data. Our approach demonstrates strong generalizability, enabling accurate link predictions even for unseen nodes.
arXiv Detail & Related papers (2025-01-03T05:29:12Z)
SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles. Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z)
The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges [101.83124435649358]
Homophily principle, ie nodes with the same labels or similar attributes are more likely to be connected. Recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory.
arXiv Detail & Related papers (2024-07-12T18:04:32Z)
Graph Relation Distillation for Efficient Biomedical Instance Segmentation [80.51124447333493]
We propose a graph relation distillation approach for efficient biomedical instance segmentation. We introduce two graph distillation schemes deployed at both the intra-image level and the inter-image level. Experimental results on a number of biomedical datasets validate the effectiveness of our approach.
arXiv Detail & Related papers (2024-01-12T04:41:23Z)
Predicting Scientific Impact Through Diffusion, Conformity, and Contribution Disentanglement [11.684776349325887]
Existing models typically rely on static graphs for citation count estimation. We introduce a novel model, DPPDCC, which Disentangles the Potential impacts of Papers into Diffusion, Conformity, and Contribution values.
arXiv Detail & Related papers (2023-11-15T07:21:11Z)
Drug Synergistic Combinations Predictions via Large-Scale Pre-Training and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation. Deep learning models have emerged as an efficient way to discover synergistic combinations. Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z)
PcMSP: A Dataset for Scientific Action Graphs Extraction from Polycrystalline Materials Synthesis Procedure Text [1.9573380763700712]
This dataset simultaneously contains the synthesis sentences extracted from the experimental paragraphs, as well as the entity mentions and intra-sentence relations. A two-step human annotation and inter-annotator agreement study guarantee the high quality of the PcMSP corpus. We introduce four natural language processing tasks: sentence classification, named entity recognition, relation classification, and joint extraction of entities and relations.
arXiv Detail & Related papers (2022-10-22T09:43:54Z)
Implications of Topological Imbalance for Representation Learning on Biomedical Knowledge Graphs [16.566710222582618]
We show how knowledge graph embedding models can be affected by structural imbalance. We show how the graph topology can be perturbed to artificially alter the rank of a gene via random, biologically meaningless information.
arXiv Detail & Related papers (2021-12-13T11:20:36Z)
Joint Biomedical Entity and Relation Extraction with Knowledge-Enhanced Collective Inference [42.255596963210564]
We present a novel framework that utilizes external knowledge for joint entity and relation extraction named KECI. KeCI takes a collective approach to link mention spans to entities by integrating global relational information into local representations. Our experimental results show that the framework is highly effective, achieving new state-of-the-art results in two different benchmark datasets.
arXiv Detail & Related papers (2021-05-27T21:33:34Z)
Hyperbolic Graph Embedding with Enhanced Semi-Implicit Variational Inference [48.63194907060615]
We build off of semi-implicit graph variational auto-encoders to capture higher-order statistics in a low-dimensional graph latent representation. We incorporate hyperbolic geometry in the latent space through a Poincare embedding to efficiently represent graphs exhibiting hierarchical structure.
arXiv Detail & Related papers (2020-10-31T05:48:34Z)
HittER: Hierarchical Transformers for Knowledge Graph Embeddings [85.93509934018499]
We propose Hitt to learn representations of entities and relations in a complex knowledge graph. Experimental results show that Hitt achieves new state-of-the-art results on multiple link prediction. We additionally propose a simple approach to integrate Hitt into BERT and demonstrate its effectiveness on two Freebase factoid answering datasets.
arXiv Detail & Related papers (2020-08-28T18:58:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.