SemEval-2021 Task 11: NLPContributionGraph -- Structuring Scholarly NLP
Contributions for a Research Knowledge Graph
- URL: http://arxiv.org/abs/2106.07385v1
- Date: Thu, 10 Jun 2021 13:43:47 GMT
- Title: SemEval-2021 Task 11: NLPContributionGraph -- Structuring Scholarly NLP
Contributions for a Research Knowledge Graph
- Authors: Jennifer D'Souza, S\"oren Auer and Ted Pedersen
- Abstract summary: The SemEval-2021 Shared Task NLPContributionGraph (a.k.a. 'the NCG task') tasks participants to develop automated systems that structure contributions from NLP scholarly articles in the English language.
Being the first-of-its-kind in the SemEval series, the task released structured data from NLP scholarly articles at three levels of information granularity.
The best end-to-end task system classified contribution sentences at 57.27% F1, phrases at 46.41% F1, and triples at 22.28% F1.
- Score: 0.17188280334580192
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: There is currently a gap between the natural language expression of scholarly
publications and their structured semantic content modeling to enable
intelligent content search. With the volume of research growing exponentially
every year, a search feature operating over semantically structured content is
compelling. The SemEval-2021 Shared Task NLPContributionGraph (a.k.a. 'the NCG
task') tasks participants to develop automated systems that structure
contributions from NLP scholarly articles in the English language. Being the
first-of-its-kind in the SemEval series, the task released structured data from
NLP scholarly articles at three levels of information granularity, i.e. at
sentence-level, phrase-level, and phrases organized as triples toward Knowledge
Graph (KG) building. The sentence-level annotations comprised the few sentences
about the article's contribution. The phrase-level annotations were scientific
term and predicate phrases from the contribution sentences. Finally, the
triples constituted the research overview KG. For the Shared Task,
participating systems were then expected to automatically classify contribution
sentences, extract scientific terms and relations from the sentences, and
organize them as KG triples.
Overall, the task drew a strong participation demographic of seven teams and
27 participants. The best end-to-end task system classified contribution
sentences at 57.27% F1, phrases at 46.41% F1, and triples at 22.28% F1. While
the absolute performance to generate triples remains low, in the conclusion of
this article, the difficulty of producing such data and as a consequence of
modeling it is highlighted.
Related papers
- Unsupervised Chunking with Hierarchical RNN [62.15060807493364]
This paper introduces an unsupervised approach to chunking, a syntactic task that involves grouping words in a non-hierarchical manner.
We present a two-layer Hierarchical Recurrent Neural Network (HRNN) designed to model word-to-chunk and chunk-to-sentence compositions.
Experiments on the CoNLL-2000 dataset reveal a notable improvement over existing unsupervised methods, enhancing phrase F1 score by up to 6 percentage points.
arXiv Detail & Related papers (2023-09-10T02:55:12Z) - Mapping Process for the Task: Wikidata Statements to Text as Wikipedia
Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level.
The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia.
We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z) - CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared
Task [11.716878242203267]
We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE)
Our team participated on all three subtasks: (i) Sentence and Word-level Quality Prediction; (ii) Explainable QE; and (iii) Critical Error Detection.
arXiv Detail & Related papers (2022-09-13T18:05:12Z) - O-Dang! The Ontology of Dangerous Speech Messages [53.15616413153125]
We present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG)
O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community.
It provides a model for encoding both gold standard and single-annotator labels in the KG.
arXiv Detail & Related papers (2022-07-13T11:50:05Z) - UPB at SemEval-2021 Task 1: Combining Deep Learning and Hand-Crafted
Features for Lexical Complexity Prediction [0.7197592390105455]
We describe our approach for the SemEval-2021 Task 1: Lexical Complexity Prediction competition.
Our results are just 5.46% and 6.5% lower than the top scores obtained in the competition on the first and the second subtasks.
arXiv Detail & Related papers (2021-04-14T17:05:46Z) - KnowGraph@IITK at SemEval-2021 Task 11: Building KnowledgeGraph for NLP
Research [2.1012672709024294]
We develop a system for a research paper contributions-focused knowledge graph over Natural Language Processing literature.
The proposed system is agnostic to the subject domain and can be applied for building a knowledge graph for any area.
Our system achieved F1 score of 0.38, 0.63 and 0.76 in end-to-end pipeline testing, phrase extraction testing and triplet extraction testing respectively.
arXiv Detail & Related papers (2021-04-04T14:33:21Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of
Natural Language Processing Contributions -- A Trial Dataset [0.0]
The aim of this work is to normalize the NLPCONTRIBUTIONS scheme to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles.
The application of NLPCONTRIBUTIONGRAPH on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences, 4,702 contribution-information-centered phrases, and 2,980 surface-structured triples.
arXiv Detail & Related papers (2020-10-09T06:45:35Z) - NLPContributions: An Annotation Scheme for Machine Reading of Scholarly
Contributions in Natural Language Processing Literature [0.0]
We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles.
We develop the annotation task based on a pilot exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks.
We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development.
arXiv Detail & Related papers (2020-06-23T10:04:39Z) - RUSSE'2020: Findings of the First Taxonomy Enrichment Task for the
Russian language [70.27072729280528]
This paper describes the results of the first shared task on taxonomy enrichment for the Russian language.
16 teams participated in the task demonstrating high results with more than half of them outperforming the provided baseline.
arXiv Detail & Related papers (2020-05-22T13:30:37Z) - Structure-Augmented Text Representation Learning for Efficient Knowledge
Graph Completion [53.31911669146451]
Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks.
These graphs are usually incomplete, urging auto-completion of them.
graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings.
textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations.
arXiv Detail & Related papers (2020-04-30T13:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.