NLPContributions: An Annotation Scheme for Machine Reading of Scholarly
Contributions in Natural Language Processing Literature
- URL: http://arxiv.org/abs/2006.12870v3
- Date: Thu, 3 Sep 2020 05:23:56 GMT
- Title: NLPContributions: An Annotation Scheme for Machine Reading of Scholarly
Contributions in Natural Language Processing Literature
- Authors: Jennifer D'Souza and S\"oren Auer
- Abstract summary: We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles.
We develop the annotation task based on a pilot exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks.
We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We describe an annotation initiative to capture the scholarly contributions
in natural language processing (NLP) articles, particularly, for the articles
that discuss machine learning (ML) approaches for various information
extraction tasks. We develop the annotation task based on a pilot annotation
exercise on 50 NLP-ML scholarly articles presenting contributions to five
information extraction tasks 1. machine translation, 2. named entity
recognition, 3. question answering, 4. relation classification, and 5. text
classification. In this article, we describe the outcomes of this pilot
annotation phase. Through the exercise we have obtained an annotation
methodology; and found ten core information units that reflect the contribution
of the NLP-ML scholarly investigations. The resulting annotation scheme we
developed based on these information units is called NLPContributions.
The overarching goal of our endeavor is four-fold: 1) to find a systematic
set of patterns of subject-predicate-object statements for the semantic
structuring of scholarly contributions that are more or less generically
applicable for NLP-ML research articles; 2) to apply the discovered patterns in
the creation of a larger annotated dataset for training machine readers of
research contributions; 3) to ingest the dataset into the Open Research
Knowledge Graph (ORKG) infrastructure as a showcase for creating user-friendly
state-of-the-art overviews; 4) to integrate the machine readers into the ORKG
to assist users in the manual curation of their respective article
contributions. We envision that the NLPContributions methodology engenders a
wider discussion on the topic toward its further refinement and development.
Our pilot annotated dataset of 50 NLP-ML scholarly articles according to the
NLPContributions scheme is openly available to the research community at
https://doi.org/10.25835/0019761.
Related papers
- De-jargonizing Science for Journalists with GPT-4: A Pilot Study [3.730699089967391]
The system achieves fairly high recall in identifying jargon and preserves relative differences in readers' jargon identification.
The findings highlight the potential of generative AI for assisting science reporters, and can inform future work on developing tools to simplify dense documents.
arXiv Detail & Related papers (2024-10-15T21:10:01Z) - The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We quantitatively investigate what constitutes NLP research by examining research papers.
Our findings reveal a rising involvement of machine learning in NLP since the early nineties.
In post-2020, there has been a resurgence of focus on language and people.
arXiv Detail & Related papers (2024-09-29T01:29:28Z) - Systematic Task Exploration with LLMs: A Study in Citation Text Generation [63.50597360948099]
Large language models (LLMs) bring unprecedented flexibility in defining and executing complex, creative natural language generation (NLG) tasks.
We propose a three-component research framework that consists of systematic input manipulation, reference data, and output measurement.
We use this framework to explore citation text generation -- a popular scholarly NLP task that lacks consensus on the task definition and evaluation metric.
arXiv Detail & Related papers (2024-07-04T16:41:08Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - Specifying Genericity through Inclusiveness and Abstractness Continuous Scales [1.024113475677323]
This paper introduces a novel annotation framework for the fine-grained modeling of Noun Phrases' (NPs) genericity in natural language.
The framework is designed to be simple and intuitive, making it accessible to non-expert annotators and suitable for crowd-sourced tasks.
arXiv Detail & Related papers (2024-03-22T15:21:07Z) - Large Language Models for Scientific Information Extraction: An
Empirical Study for Virology [0.0]
We champion the use of structured and semantic content representation of discourse-based scholarly communication.
Inspired by tools like Wikipedia infoboxes or structured Amazon product descriptions, we develop an automated approach to produce structured scholarly contribution summaries.
Our results show that finetuned FLAN-T5 with 1000x fewer parameters than the state-of-the-art GPT-davinci is competitive for the task.
arXiv Detail & Related papers (2024-01-18T15:04:55Z) - InFoBench: Evaluating Instruction Following Ability in Large Language
Models [57.27152890085759]
Decomposed Requirements Following Ratio (DRFR) is a new metric for evaluating Large Language Models' (LLMs) ability to follow instructions.
We present InFoBench, a benchmark comprising 500 diverse instructions and 2,250 decomposed questions across multiple constraint categories.
arXiv Detail & Related papers (2024-01-07T23:01:56Z) - NLPeer: A Unified Resource for the Computational Study of Peer Review [58.71736531356398]
We introduce NLPeer -- the first ethically sourced multidomain corpus of more than 5k papers and 11k review reports from five different venues.
We augment previous peer review datasets to include parsed and structured paper representations, rich metadata and versioning information.
Our work paves the path towards systematic, multi-faceted, evidence-based study of peer review in NLP and beyond.
arXiv Detail & Related papers (2022-11-12T12:29:38Z) - O-Dang! The Ontology of Dangerous Speech Messages [53.15616413153125]
We present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG)
O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community.
It provides a model for encoding both gold standard and single-annotator labels in the KG.
arXiv Detail & Related papers (2022-07-13T11:50:05Z) - Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of
Natural Language Processing Contributions -- A Trial Dataset [0.0]
The aim of this work is to normalize the NLPCONTRIBUTIONS scheme to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles.
The application of NLPCONTRIBUTIONGRAPH on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences, 4,702 contribution-information-centered phrases, and 2,980 surface-structured triples.
arXiv Detail & Related papers (2020-10-09T06:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.