Related papers: Using the Full-text Content of Academic Articles to Identify and Evaluate Algorithm Entities in the Domain of Natural Language Processing

Using the Full-text Content of Academic Articles to Identify and Evaluate Algorithm Entities in the Domain of Natural Language Processing

URL: http://arxiv.org/abs/2010.10817v1
Date: Wed, 21 Oct 2020 08:24:18 GMT
Title: Using the Full-text Content of Academic Articles to Identify and Evaluate Algorithm Entities in the Domain of Natural Language Processing
Authors: Yuzhuo Wang, Chengzhi Zhang
Abstract summary: This article takes the field of natural language processing (NLP) as an example and identifies algorithms from academic papers in the field. A dictionary of algorithms is constructed by manually annotating the contents of papers, and sentences containing algorithms in the dictionary are extracted through dictionary-based matching. The number of articles mentioning an algorithm is used as an indicator to analyze the influence of that algorithm.
Score: 7.163189900803623
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the era of big data, the advancement, improvement, and application of algorithms in academic research have played an important role in promoting the development of different disciplines. Academic papers in various disciplines, especially computer science, contain a large number of algorithms. Identifying the algorithms from the full-text content of papers can determine popular or classical algorithms in a specific field and help scholars gain a comprehensive understanding of the algorithms and even the field. To this end, this article takes the field of natural language processing (NLP) as an example and identifies algorithms from academic papers in the field. A dictionary of algorithms is constructed by manually annotating the contents of papers, and sentences containing algorithms in the dictionary are extracted through dictionary-based matching. The number of articles mentioning an algorithm is used as an indicator to analyze the influence of that algorithm. Our results reveal the algorithm with the highest influence in NLP papers and show that classification algorithms represent the largest proportion among the high-impact algorithms. In addition, the evolution of the influence of algorithms reflects the changes in research tasks and topics in the field, and the changes in the influence of different algorithms show different trends. As a preliminary exploration, this paper conducts an analysis of the impact of algorithms mentioned in the academic text, and the results can be used as training data for the automatic extraction of large-scale algorithms in the future. The methodology in this paper is domain-independent and can be applied to other domains.

Related papers

Optimizing Text Search: A Novel Pattern Matching Algorithm Based on Ukkonen's Approach [7.975242816297842]
This study focuses on optimizing Suffix Trees through methods like Splitting and Ukkonen's Algorithm.<n>A novel optimization combining Ukkonen's Algorithm with a new search technique is introduced, showing linear time and space efficiencies.<n> Empirical tests confirm the theoretical advantages, highlighting the optimized Suffix Tree's effectiveness in tasks like pattern recognition in genomic sequences.
arXiv Detail & Related papers (2025-11-29T16:05:13Z)
Evolutionary Algorithms Approach For Search Based On Semantic Document Similarity [0.0]
We develop clustering, recommendation, and question-and-answering systems using various text representation techniques. We show that Universal Sentence vectors (USE) is used to capture the semantic similarity of text. And the transfer learning technique is used to apply Genetic Algorithm (GA) and Differential Evolution (DE) algorithms to search and retrieve relevant top N documents.
arXiv Detail & Related papers (2025-02-20T18:56:52Z)
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models [63.188607839223046]
This survey focuses on the benefits of scaling compute during inference. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation.
arXiv Detail & Related papers (2024-06-24T17:45:59Z)
A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z)
An Analysis of the Effects of Decoding Algorithms on Fairness in Open-Ended Language Generation [77.44921096644698]
We present a systematic analysis of the impact of decoding algorithms on LM fairness. We analyze the trade-off between fairness, diversity and quality.
arXiv Detail & Related papers (2022-10-07T21:33:34Z)
The CLRS Algorithmic Reasoning Benchmark [28.789225199559834]
Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms. We propose the CLRS Algorithmic Reasoning Benchmark, covering classical algorithms from the Introduction to Algorithms textbook. Our benchmark spans a variety of algorithmic reasoning procedures, including sorting, searching, dynamic programming, graph algorithms, string algorithms and geometric algorithms.
arXiv Detail & Related papers (2022-05-31T09:56:44Z)
An Approach for Automatic Construction of an Algorithmic Knowledge Graph from Textual Resources [3.723553383515688]
We introduce an approach for automatically developing a knowledge graph for algorithmic problems from unstructured data. An algorithm KG will give additional context and explainability to the algorithm metadata.
arXiv Detail & Related papers (2022-05-13T18:59:23Z)
Deep Algorithm Unrolling for Biomedical Imaging [99.73317152134028]
In this chapter, we review biomedical applications and breakthroughs via leveraging algorithm unrolling. We trace the origin of algorithm unrolling and provide a comprehensive tutorial on how to unroll iterative algorithms into deep networks. We conclude the chapter by discussing open challenges, and suggesting future research directions.
arXiv Detail & Related papers (2021-08-15T01:06:26Z)
Identifying Co-Adaptation of Algorithmic and Implementational Innovations in Deep Reinforcement Learning: A Taxonomy and Case Study of Inference-based Algorithms [15.338931971492288]
We focus on a series of inference-based actor-critic algorithms to decouple their algorithmic innovations and implementation decisions. We identify substantial performance drops whenever implementation details are mismatched for algorithmic choices. Results show which implementation details are co-adapted and co-evolved with algorithms.
arXiv Detail & Related papers (2021-03-31T17:55:20Z)
Critical Analysis: Bat Algorithm based Investigation and Application on Several Domains [1.1802674324027231]
The idea of the algorithm was taken from the echolocation ability of bats. Bat Algorithm is given in-depth in terms of backgrounds, characteristics, limitations.
arXiv Detail & Related papers (2021-01-18T19:25:12Z)
A Novel Word Sense Disambiguation Approach Using WordNet Knowledge Graph [0.0]
This paper presents a knowledge-based word sense disambiguation algorithm, namely Sequential Contextual Similarity Matrix multiplication (SCSMM) The SCSMM algorithm combines semantic similarity, knowledge, and document context to respectively exploit the merits of local context. The proposed algorithm outperformed all other algorithms when disambiguating nouns on the combined gold standard datasets.
arXiv Detail & Related papers (2021-01-08T06:47:32Z)
Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach. The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features. Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z)
A Survey of Embedding Space Alignment Methods for Language and Knowledge Graphs [77.34726150561087]
We survey the current research landscape on word, sentence and knowledge graph embedding algorithms. We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research.
arXiv Detail & Related papers (2020-10-26T16:08:13Z)
A Brief Look at Generalization in Visual Meta-Reinforcement Learning [56.50123642237106]
We evaluate the generalization performance of meta-reinforcement learning algorithms. We find that these algorithms can display strong overfitting when they are evaluated on challenging tasks.
arXiv Detail & Related papers (2020-06-12T15:17:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.