Topic Space Trajectories: A case study on machine learning literature
- URL: http://arxiv.org/abs/2010.12294v3
- Date: Tue, 18 May 2021 12:09:47 GMT
- Title: Topic Space Trajectories: A case study on machine learning literature
- Authors: Bastian Sch\"afermeier and Gerd Stumme and Tom Hanika
- Abstract summary: We present topic space trajectories, a structure that allows for the comprehensible tracking of research topics.
We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues.
Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The annual number of publications at scientific venues, for example,
conferences and journals, is growing quickly. Hence, even for researchers it
becomes harder and harder to keep track of research topics and their progress.
In this task, researchers can be supported by automated publication analysis.
Yet, many such methods result in uninterpretable, purely numerical
representations. As an attempt to support human analysts, we present topic
space trajectories, a structure that allows for the comprehensible tracking of
research topics. We demonstrate how these trajectories can be interpreted based
on eight different analysis approaches. To obtain comprehensible results, we
employ non-negative matrix factorization as well as suitable visualization
techniques. We show the applicability of our approach on a publication corpus
spanning 50 years of machine learning research from 32 publication venues. Our
novel analysis method may be employed for paper classification, for the
prediction of future research topics, and for the recommendation of fitting
conferences and journals for submitting unpublished work.
Related papers
- Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data.
One straightforward solution is to integrate statistical analysis and machine learning.
Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z) - MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - A Comprehensive Study of Groundbreaking Machine Learning Research:
Analyzing highly cited and impactful publications across six decades [1.6442870218029522]
Machine learning (ML) has emerged as a prominent field of research in computer science and other related fields.
It is crucial to understand the landscape of highly cited publications to identify key trends, influential authors, and significant contributions made thus far.
arXiv Detail & Related papers (2023-08-01T21:43:22Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Mapping Research Trajectories [0.0]
We propose a principled approach for emphmapping research trajectories, which is applicable to all kinds of scientific entities.
Our visualizations depict the research topics of entities over time in a straightforward interpr. manner.
In a practical demonstrator application, we exemplify the proposed approach on a publication corpus from machine learning.
arXiv Detail & Related papers (2022-04-25T13:32:39Z) - DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature [0.7349727826230862]
We open source DRIFT, which allows researchers to track research trends and development over the years.
The analysis methods are collated from well-cited research works, with a few of our own methods added for good measure.
To demonstrate the utility and efficacy of our tool, we perform a case study on the cs.CL corpus of the arXiv repository and draw inferences from the analysis methods.
arXiv Detail & Related papers (2021-07-02T17:33:25Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Evaluating the state-of-the-art in mapping research spaces: a Brazilian
case study [0.0]
Two recent works propose methods for creating research maps from scientists' publication records.
We evaluate these models' ability to predict whether a given entity will enter a new field.
We conduct a case study to showcase how these models can be used to characterize science dynamics in the context of Brazil.
arXiv Detail & Related papers (2021-04-07T18:14:41Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Semantic and Relational Spaces in Science of Science: Deep Learning
Models for Article Vectorisation [4.178929174617172]
We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs)
Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded.
arXiv Detail & Related papers (2020-11-05T14:57:41Z) - A Survey of Embedding Space Alignment Methods for Language and Knowledge
Graphs [77.34726150561087]
We survey the current research landscape on word, sentence and knowledge graph embedding algorithms.
We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research.
arXiv Detail & Related papers (2020-10-26T16:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.