Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of
Natural Language Processing Contributions -- A Trial Dataset
- URL: http://arxiv.org/abs/2010.04388v3
- Date: Fri, 7 May 2021 06:08:59 GMT
- Title: Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of
Natural Language Processing Contributions -- A Trial Dataset
- Authors: Jennifer D'Souza, S\"oren Auer
- Abstract summary: The aim of this work is to normalize the NLPCONTRIBUTIONS scheme to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles.
The application of NLPCONTRIBUTIONGRAPH on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences, 4,702 contribution-information-centered phrases, and 2,980 surface-structured triples.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Purpose: The aim of this work is to normalize the NLPCONTRIBUTIONS scheme
(henceforward, NLPCONTRIBUTIONGRAPH) to structure, directly from article
sentences, the contributions information in Natural Language Processing (NLP)
scholarly articles via a two-stage annotation methodology: 1) pilot stage - to
define the scheme (described in prior work); and 2) adjudication stage - to
normalize the graphing model (the focus of this paper).
Design/methodology/approach: We re-annotate, a second time, the
contributions-pertinent information across 50 prior-annotated NLP scholarly
articles in terms of a data pipeline comprising: contribution-centered
sentences, phrases, and triple statements. To this end, specifically, care was
taken in the adjudication annotation stage to reduce annotation noise while
formulating the guidelines for our proposed novel NLP contributions structuring
and graphing scheme.
Findings: The application of NLPCONTRIBUTIONGRAPH on the 50 articles resulted
finally in a dataset of 900 contribution-focused sentences, 4,702
contribution-information-centered phrases, and 2,980 surface-structured
triples. The intra-annotation agreement between the first and second stages, in
terms of F1, was 67.92% for sentences, 41.82% for phrases, and 22.31% for
triple statements indicating that with increased granularity of the
information, the annotation decision variance is greater.
Practical Implications: We demonstrate NLPCONTRIBUTIONGRAPH data integrated
into the Open Research Knowledge Graph (ORKG), a next-generation KG-based
digital library with intelligent computations enabled over structured scholarly
knowledge, as a viable aid to assist researchers in their day-to-day tasks.
Related papers
- Text-Augmented Open Knowledge Graph Completion via Pre-Trained Language
Models [53.09723678623779]
We propose TAGREAL to automatically generate quality query prompts and retrieve support information from large text corpora.
The results show that TAGREAL achieves state-of-the-art performance on two benchmark datasets.
We find that TAGREAL has superb performance even with limited training data, outperforming existing embedding-based, graph-based, and PLM-based methods.
arXiv Detail & Related papers (2023-05-24T22:09:35Z) - Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.
We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages.
Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z) - An Open Natural Language Processing Development Framework for EHR-based
Clinical Research: A case demonstration using the National COVID Cohort
Collaborative (N3C) [29.701601520785033]
We propose an open natural language processing development framework and evaluate it through the implementation of NLP algorithms for the National COVID Cohort Collaborative (N3C)
Based on the interests in information extraction from COVID-19 related clinical notes, our work includes 1) an open data annotation process using COVID-19 signs and symptoms as the use case, 2) a community-driven ruleset composing platform, and 3) a synthetic text data generation workflow to generate texts for information extraction tasks without involving human subjects.
arXiv Detail & Related papers (2021-10-20T21:09:41Z) - SemEval-2021 Task 11: NLPContributionGraph -- Structuring Scholarly NLP
Contributions for a Research Knowledge Graph [0.17188280334580192]
The SemEval-2021 Shared Task NLPContributionGraph (a.k.a. 'the NCG task') tasks participants to develop automated systems that structure contributions from NLP scholarly articles in the English language.
Being the first-of-its-kind in the SemEval series, the task released structured data from NLP scholarly articles at three levels of information granularity.
The best end-to-end task system classified contribution sentences at 57.27% F1, phrases at 46.41% F1, and triples at 22.28% F1.
arXiv Detail & Related papers (2021-06-10T13:43:47Z) - Annotation Curricula to Implicitly Train Non-Expert Annotators [56.67768938052715]
voluntary studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain.
This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations.
We propose annotation curricula, a novel approach to implicitly train annotators.
arXiv Detail & Related papers (2021-06-04T09:48:28Z) - KnowGraph@IITK at SemEval-2021 Task 11: Building KnowledgeGraph for NLP
Research [2.1012672709024294]
We develop a system for a research paper contributions-focused knowledge graph over Natural Language Processing literature.
The proposed system is agnostic to the subject domain and can be applied for building a knowledge graph for any area.
Our system achieved F1 score of 0.38, 0.63 and 0.76 in end-to-end pipeline testing, phrase extraction testing and triplet extraction testing respectively.
arXiv Detail & Related papers (2021-04-04T14:33:21Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Investigating Pretrained Language Models for Graph-to-Text Generation [55.55151069694146]
Graph-to-text generation aims to generate fluent texts from graph-based data.
We present a study across three graph domains: meaning representations, Wikipedia knowledge graphs (KGs) and scientific KGs.
We show that the PLMs BART and T5 achieve new state-of-the-art results and that task-adaptive pretraining strategies improve their performance even further.
arXiv Detail & Related papers (2020-07-16T16:05:34Z) - NLPContributions: An Annotation Scheme for Machine Reading of Scholarly
Contributions in Natural Language Processing Literature [0.0]
We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles.
We develop the annotation task based on a pilot exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks.
We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development.
arXiv Detail & Related papers (2020-06-23T10:04:39Z) - Structure-Augmented Text Representation Learning for Efficient Knowledge
Graph Completion [53.31911669146451]
Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks.
These graphs are usually incomplete, urging auto-completion of them.
graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings.
textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations.
arXiv Detail & Related papers (2020-04-30T13:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.