Using the Full-text Content of Academic Articles to Identify and
Evaluate Algorithm Entities in the Domain of Natural Language Processing
- URL: http://arxiv.org/abs/2010.10817v1
- Date: Wed, 21 Oct 2020 08:24:18 GMT
- Title: Using the Full-text Content of Academic Articles to Identify and
Evaluate Algorithm Entities in the Domain of Natural Language Processing
- Authors: Yuzhuo Wang, Chengzhi Zhang
- Abstract summary: This article takes the field of natural language processing (NLP) as an example and identifies algorithms from academic papers in the field.
A dictionary of algorithms is constructed by manually annotating the contents of papers, and sentences containing algorithms in the dictionary are extracted through dictionary-based matching.
The number of articles mentioning an algorithm is used as an indicator to analyze the influence of that algorithm.
- Score: 7.163189900803623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of big data, the advancement, improvement, and application of
algorithms in academic research have played an important role in promoting the
development of different disciplines. Academic papers in various disciplines,
especially computer science, contain a large number of algorithms. Identifying
the algorithms from the full-text content of papers can determine popular or
classical algorithms in a specific field and help scholars gain a comprehensive
understanding of the algorithms and even the field. To this end, this article
takes the field of natural language processing (NLP) as an example and
identifies algorithms from academic papers in the field. A dictionary of
algorithms is constructed by manually annotating the contents of papers, and
sentences containing algorithms in the dictionary are extracted through
dictionary-based matching. The number of articles mentioning an algorithm is
used as an indicator to analyze the influence of that algorithm. Our results
reveal the algorithm with the highest influence in NLP papers and show that
classification algorithms represent the largest proportion among the
high-impact algorithms. In addition, the evolution of the influence of
algorithms reflects the changes in research tasks and topics in the field, and
the changes in the influence of different algorithms show different trends. As
a preliminary exploration, this paper conducts an analysis of the impact of
algorithms mentioned in the academic text, and the results can be used as
training data for the automatic extraction of large-scale algorithms in the
future. The methodology in this paper is domain-independent and can be applied
to other domains.
Related papers
- From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models [63.188607839223046]
This survey focuses on the benefits of scaling compute during inference.
We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation.
arXiv Detail & Related papers (2024-06-24T17:45:59Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - An Analysis of the Effects of Decoding Algorithms on Fairness in
Open-Ended Language Generation [77.44921096644698]
We present a systematic analysis of the impact of decoding algorithms on LM fairness.
We analyze the trade-off between fairness, diversity and quality.
arXiv Detail & Related papers (2022-10-07T21:33:34Z) - The CLRS Algorithmic Reasoning Benchmark [28.789225199559834]
Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms.
We propose the CLRS Algorithmic Reasoning Benchmark, covering classical algorithms from the Introduction to Algorithms textbook.
Our benchmark spans a variety of algorithmic reasoning procedures, including sorting, searching, dynamic programming, graph algorithms, string algorithms and geometric algorithms.
arXiv Detail & Related papers (2022-05-31T09:56:44Z) - An Approach for Automatic Construction of an Algorithmic Knowledge Graph
from Textual Resources [3.723553383515688]
We introduce an approach for automatically developing a knowledge graph for algorithmic problems from unstructured data.
An algorithm KG will give additional context and explainability to the algorithm metadata.
arXiv Detail & Related papers (2022-05-13T18:59:23Z) - Deep Algorithm Unrolling for Biomedical Imaging [99.73317152134028]
In this chapter, we review biomedical applications and breakthroughs via leveraging algorithm unrolling.
We trace the origin of algorithm unrolling and provide a comprehensive tutorial on how to unroll iterative algorithms into deep networks.
We conclude the chapter by discussing open challenges, and suggesting future research directions.
arXiv Detail & Related papers (2021-08-15T01:06:26Z) - Identifying Co-Adaptation of Algorithmic and Implementational
Innovations in Deep Reinforcement Learning: A Taxonomy and Case Study of
Inference-based Algorithms [15.338931971492288]
We focus on a series of inference-based actor-critic algorithms to decouple their algorithmic innovations and implementation decisions.
We identify substantial performance drops whenever implementation details are mismatched for algorithmic choices.
Results show which implementation details are co-adapted and co-evolved with algorithms.
arXiv Detail & Related papers (2021-03-31T17:55:20Z) - Critical Analysis: Bat Algorithm based Investigation and Application on
Several Domains [1.1802674324027231]
The idea of the algorithm was taken from the echolocation ability of bats.
Bat Algorithm is given in-depth in terms of backgrounds, characteristics, limitations.
arXiv Detail & Related papers (2021-01-18T19:25:12Z) - A Novel Word Sense Disambiguation Approach Using WordNet Knowledge Graph [0.0]
This paper presents a knowledge-based word sense disambiguation algorithm, namely Sequential Contextual Similarity Matrix multiplication (SCSMM)
The SCSMM algorithm combines semantic similarity, knowledge, and document context to respectively exploit the merits of local context.
The proposed algorithm outperformed all other algorithms when disambiguating nouns on the combined gold standard datasets.
arXiv Detail & Related papers (2021-01-08T06:47:32Z) - Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach.
The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features.
Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z) - A Survey of Embedding Space Alignment Methods for Language and Knowledge
Graphs [77.34726150561087]
We survey the current research landscape on word, sentence and knowledge graph embedding algorithms.
We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research.
arXiv Detail & Related papers (2020-10-26T16:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.