Analyzing the State of Computer Science Research with the DBLP Discovery
Dataset
- URL: http://arxiv.org/abs/2212.00629v1
- Date: Thu, 1 Dec 2022 16:27:42 GMT
- Title: Analyzing the State of Computer Science Research with the DBLP Discovery
Dataset
- Authors: Lennart K\"ull
- Abstract summary: We conduct a scientometric analysis to uncover the implicit patterns hidden in CS metadata.
We introduce the CS-Insights system, an interactive web application to analyze CS publications with various dashboards, filters, and visualizations.
Both D3 and CS-Insights are open-access, and CS-Insights can be easily adapted to other datasets in the future.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The number of scientific publications continues to rise exponentially,
especially in Computer Science (CS). However, current solutions to analyze
those publications restrict access behind a paywall, offer no features for
visual analysis, limit access to their data, only focus on niches or
sub-fields, and/or are not flexible and modular enough to be transferred to
other datasets. In this thesis, we conduct a scientometric analysis to uncover
the implicit patterns hidden in CS metadata and to determine the state of CS
research. Specifically, we investigate trends of the quantity, impact, and
topics for authors, venues, document types (conferences vs. journals), and
fields of study (compared to, e.g., medicine). To achieve this we introduce the
CS-Insights system, an interactive web application to analyze CS publications
with various dashboards, filters, and visualizations. The data underlying this
system is the DBLP Discovery Dataset (D3), which contains metadata from 5
million CS publications. Both D3 and CS-Insights are open-access, and
CS-Insights can be easily adapted to other datasets in the future. The most
interesting findings of our scientometric analysis include that i) there has
been a stark increase in publications, authors, and venues in the last two
decades, ii) many authors only recently joined the field, iii) the most cited
authors and venues focus on computer vision and pattern recognition, while the
most productive prefer engineering-related topics, iv) the preference of
researchers to publish in conferences over journals dwindles, v) on average,
journal articles receive twice as many citations compared to conference papers,
but the contrast is much smaller for the most cited conferences and journals,
and vi) journals also get more citations in all other investigated fields of
study, while only CS and engineering publish more in conferences than journals.
Related papers
- A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - Position: AI/ML Influencers Have a Place in the Academic Process [82.2069685579588]
We investigate the role of social media influencers in enhancing the visibility of machine learning research.
We have compiled a comprehensive dataset of over 8,000 papers, spanning tweets from December 2018 to October 2023.
Our statistical and causal inference analysis reveals a significant increase in citations for papers endorsed by these influencers.
arXiv Detail & Related papers (2024-01-24T20:05:49Z) - A Comprehensive Study of Groundbreaking Machine Learning Research:
Analyzing highly cited and impactful publications across six decades [1.6442870218029522]
Machine learning (ML) has emerged as a prominent field of research in computer science and other related fields.
It is crucial to understand the landscape of highly cited publications to identify key trends, influential authors, and significant contributions made thus far.
arXiv Detail & Related papers (2023-08-01T21:43:22Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - Citation Trajectory Prediction via Publication Influence Representation
Using Temporal Knowledge Graph [52.07771598974385]
Existing approaches mainly rely on mining temporal and graph data from academic articles.
Our framework is composed of three modules: difference-preserved graph embedding, fine-grained influence representation, and learning-based trajectory calculation.
Experiments are conducted on both the APS academic dataset and our contributed AIPatent dataset.
arXiv Detail & Related papers (2022-10-02T07:43:26Z) - D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of
Computer Science Research [27.882505456528243]
DBLP is the largest open-access repository of scientific articles on computer science.
We retrieved more than 6 million publications from DBLP and extracted metadata.
D3 can be used to identify trends in research activity, productivity, focus, bias, accessibility, and impact of computer science research.
arXiv Detail & Related papers (2022-04-28T09:59:52Z) - Industry and Academic Research in Computer Vision [5.634825161148484]
This work aims to study the dynamic between research in the industry and academia in computer vision.
The results are demonstrated on a set of top-5 vision conferences that are representative of the field.
arXiv Detail & Related papers (2021-07-10T20:09:52Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Topic Space Trajectories: A case study on machine learning literature [0.0]
We present topic space trajectories, a structure that allows for the comprehensible tracking of research topics.
We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues.
Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work.
arXiv Detail & Related papers (2020-10-23T10:53:42Z) - Machine Identification of High Impact Research through Text and Image
Analysis [0.4737991126491218]
We present a system to automatically separate papers with a high from those with a low likelihood of gaining citations.
Our system uses both a visual classifier, useful for surmising a document's overall appearance, and a text classifier, for making content-informed decisions.
arXiv Detail & Related papers (2020-05-20T19:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.