Related papers: Topic Space Trajectories: A case study on machine learning literature

Topic Space Trajectories: A case study on machine learning literature

URL: http://arxiv.org/abs/2010.12294v3
Date: Tue, 18 May 2021 12:09:47 GMT
Title: Topic Space Trajectories: A case study on machine learning literature
Authors: Bastian Sch\"afermeier and Gerd Stumme and Tom Hanika
Abstract summary: We present topic space trajectories, a structure that allows for the comprehensible tracking of research topics. We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues. Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The annual number of publications at scientific venues, for example, conferences and journals, is growing quickly. Hence, even for researchers it becomes harder and harder to keep track of research topics and their progress. In this task, researchers can be supported by automated publication analysis. Yet, many such methods result in uninterpretable, purely numerical representations. As an attempt to support human analysts, we present topic space trajectories, a structure that allows for the comprehensible tracking of research topics. We demonstrate how these trajectories can be interpreted based on eight different analysis approaches. To obtain comprehensible results, we employ non-negative matrix factorization as well as suitable visualization techniques. We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues. Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work.

Related papers

Large-Scale Multidimensional Knowledge Profiling of Scientific Literature [46.15403461273178]
We compile a unified corpus of more than 100,000 papers from 22 major conferences between 2020 and 2025.<n>Our analysis highlights several notable shifts, including the growth of safety, multimodal reasoning, and agent-oriented studies.<n>These findings provide an evidence-based view of how AI research is evolving and offer a resource for understanding broader trends and identifying emerging directions.
arXiv Detail & Related papers (2026-01-21T16:47:05Z)
CS-PaperSum: A Large-Scale Dataset of AI-Generated Summaries for Scientific Papers [3.929864777332447]
CS-PaperSum is a large-scale dataset of 91,919 papers from 31 top-tier computer science conferences. Our dataset enables automated literature analysis, research trend forecasting, and AI-driven scientific discovery.
arXiv Detail & Related papers (2025-02-27T22:48:35Z)
Decoding MIE: A Novel Dataset Approach Using Topic Extraction and Affiliation Parsing [0.0]
This study introduces a novel dataset derived from the Medical Informatics Europe (MIE) Conference proceedings. We extracted and processed metadata and abstract from 4,606 articles published in the "Studies in Health Technology and Informatics" journal series.
arXiv Detail & Related papers (2024-10-06T19:34:23Z)
The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation Analysis [51.046457649151336]
We propose a perspective on interpretability research grounded in causal mediation analysis.<n>We describe the history and current state of interpretability taxonomized according to the types of causal units (mediators) employed.<n>We discuss the pros and cons of each mediator, providing insights as to when particular kinds of mediators and search methods are most appropriate.
arXiv Detail & Related papers (2024-08-02T17:51:42Z)
Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data. One straightforward solution is to integrate statistical analysis and machine learning. Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z)
MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects. MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years. We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z)
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work. ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them. We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z)
A Comprehensive Study of Groundbreaking Machine Learning Research: Analyzing highly cited and impactful publications across six decades [1.6442870218029522]
Machine learning (ML) has emerged as a prominent field of research in computer science and other related fields. It is crucial to understand the landscape of highly cited publications to identify key trends, influential authors, and significant contributions made thus far.
arXiv Detail & Related papers (2023-08-01T21:43:22Z)
Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature. We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z)
Mapping Research Trajectories [0.0]
We propose a principled approach for emphmapping research trajectories, which is applicable to all kinds of scientific entities. Our visualizations depict the research topics of entities over time in a straightforward interpr. manner. In a practical demonstrator application, we exemplify the proposed approach on a publication corpus from machine learning.
arXiv Detail & Related papers (2022-04-25T13:32:39Z)
CitationIE: Leveraging the Citation Graph for Scientific Information Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers. We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z)
Evaluating the state-of-the-art in mapping research spaces: a Brazilian case study [0.0]
Two recent works propose methods for creating research maps from scientists' publication records. We evaluate these models' ability to predict whether a given entity will enter a new field. We conduct a case study to showcase how these models can be used to characterize science dynamics in the context of Brazil.
arXiv Detail & Related papers (2021-04-07T18:14:41Z)
What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work. We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels. We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z)
Semantic and Relational Spaces in Science of Science: Deep Learning Models for Article Vectorisation [4.178929174617172]
We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs) Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded.
arXiv Detail & Related papers (2020-11-05T14:57:41Z)
A Survey of Embedding Space Alignment Methods for Language and Knowledge Graphs [77.34726150561087]
We survey the current research landscape on word, sentence and knowledge graph embedding algorithms. We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research.
arXiv Detail & Related papers (2020-10-26T16:08:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.