KnowGraph@IITK at SemEval-2021 Task 11: Building KnowledgeGraph for NLP
Research
- URL: http://arxiv.org/abs/2104.01619v1
- Date: Sun, 4 Apr 2021 14:33:21 GMT
- Title: KnowGraph@IITK at SemEval-2021 Task 11: Building KnowledgeGraph for NLP
Research
- Authors: Shashank Shailabh, Sajal Chaurasia, Ashutosh Modi
- Abstract summary: We develop a system for a research paper contributions-focused knowledge graph over Natural Language Processing literature.
The proposed system is agnostic to the subject domain and can be applied for building a knowledge graph for any area.
Our system achieved F1 score of 0.38, 0.63 and 0.76 in end-to-end pipeline testing, phrase extraction testing and triplet extraction testing respectively.
- Score: 2.1012672709024294
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Research in Natural Language Processing is making rapid advances, resulting
in the publication of a large number of research papers. Finding relevant
research papers and their contribution to the domain is a challenging problem.
In this paper, we address this challenge via the SemEval 2021 Task 11:
NLPContributionGraph, by developing a system for a research paper
contributions-focused knowledge graph over Natural Language Processing
literature. The task is divided into three sub-tasks: extracting contribution
sentences that show important contributions in the research article, extracting
phrases from the contribution sentences, and predicting the information units
in the research article together with triplet formation from the phrases. The
proposed system is agnostic to the subject domain and can be applied for
building a knowledge graph for any area. We found that transformer-based
language models can significantly improve existing techniques and utilized the
SciBERT-based model. Our first sub-task uses Bidirectional LSTM (BiLSTM)
stacked on top of SciBERT model layers, while the second sub-task uses
Conditional Random Field (CRF) on top of SciBERT with BiLSTM. The third
sub-task uses a combined SciBERT based neural approach with heuristics for
information unit prediction and triplet formation from the phrases. Our system
achieved F1 score of 0.38, 0.63 and 0.76 in end-to-end pipeline testing, phrase
extraction testing and triplet extraction testing respectively.
Related papers
- SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - Enriched BERT Embeddings for Scholarly Publication Classification [0.13654846342364302]
The NSLP 2024 FoRC Task I addresses this challenge organized as a competition.
The goal is to develop a classifier capable of predicting one of 123 predefined classes from the Open Research Knowledge Graph (ORKG) taxonomy of research fields for a given article.
arXiv Detail & Related papers (2024-05-07T09:05:20Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z) - Modeling Multi-Granularity Hierarchical Features for Relation Extraction [26.852869800344813]
We propose a novel method to extract multi-granularity features based solely on the original input sentences.
We show that effective structured features can be attained even without external knowledge.
arXiv Detail & Related papers (2022-04-09T09:44:05Z) - Exploring Neural Models for Query-Focused Summarization [74.41256438059256]
We conduct a systematic exploration of neural approaches to query-focused summarization (QFS)
We present two model extensions that achieve state-of-the-art performance on the QMSum dataset by a margin of up to 3.38 ROUGE-1, 3.72 ROUGE-2, and 3.28 ROUGE-L.
arXiv Detail & Related papers (2021-12-14T18:33:29Z) - SemEval-2021 Task 11: NLPContributionGraph -- Structuring Scholarly NLP
Contributions for a Research Knowledge Graph [0.17188280334580192]
The SemEval-2021 Shared Task NLPContributionGraph (a.k.a. 'the NCG task') tasks participants to develop automated systems that structure contributions from NLP scholarly articles in the English language.
Being the first-of-its-kind in the SemEval series, the task released structured data from NLP scholarly articles at three levels of information granularity.
The best end-to-end task system classified contribution sentences at 57.27% F1, phrases at 46.41% F1, and triples at 22.28% F1.
arXiv Detail & Related papers (2021-06-10T13:43:47Z) - Wizard of Search Engine: Access to Information Through Conversations
with Search Engines [58.53420685514819]
We make efforts to facilitate research on CIS from three aspects.
We formulate a pipeline for CIS with six sub-tasks: intent detection (ID), keyphrase extraction (KE), action prediction (AP), query selection (QS), passage selection (PS) and response generation (RG)
We release a benchmark dataset, called wizard of search engine (WISE), which allows for comprehensive and in-depth research on all aspects of CIS.
arXiv Detail & Related papers (2021-05-18T06:35:36Z) - UIUC_BioNLP at SemEval-2021 Task 11: A Cascade of Neural Models for
Structuring Scholarly NLP Contributions [1.5942130010323128]
We propose a cascade of neural models that performs sentence classification, phrase recognition, and triple extraction.
A BERT-CRF model was used to recognize and characterize relevant phrases in contribution sentences.
Our system was officially ranked second in Phase 1 evaluation and first in both parts of Phase 2 evaluation.
arXiv Detail & Related papers (2021-05-12T05:24:35Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline [39.46301416663324]
We describe our text summarization system, SciSummPip, inspired by SummPip (Zhao et al., 2020)
Our SciSummPip includes a transformer-based language model SciBERT for contextual sentence representation.
Our work differs from previous method in that content selection and a summary length constraint is applied to adapt to the scientific domain.
arXiv Detail & Related papers (2020-10-19T03:29:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.