How do software citation formats evolve over time? A longitudinal
analysis of R programming language packages
- URL: http://arxiv.org/abs/2307.09390v1
- Date: Mon, 17 Jul 2023 09:18:57 GMT
- Title: How do software citation formats evolve over time? A longitudinal
analysis of R programming language packages
- Authors: Yuzhuo Wang, Kai Li
- Abstract summary: This study compares and analyzes a longitudinal dataset of citation formats of all R packages collected in 2021 and 2022.
We investigate the different document types underlying the citations and what metadata elements in the citation formats changed over time.
- Score: 12.082972614614413
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Under the data-driven research paradigm, research software has come to play
crucial roles in nearly every stage of scientific inquiry. Scholars are
advocating for the formal citation of software in academic publications,
treating it on par with traditional research outputs. However, software is
hardly consistently cited: one software entity can be cited as different
objects, and the citations can change over time. These issues, however, are
largely overlooked in existing empirical research on software citation. To fill
the above gaps, the present study compares and analyzes a longitudinal dataset
of citation formats of all R packages collected in 2021 and 2022, in order to
understand the citation formats of R-language packages, important members in
the open-source software family, and how the citations evolve over time. In
particular, we investigate the different document types underlying the
citations and what metadata elements in the citation formats changed over time.
Furthermore, we offer an in-depth analysis of the disciplinarity of journal
articles cited as software (software papers). By undertaking this research, we
aim to contribute to a better understanding of the complexities associated with
software citation, shedding light on future software citation policies and
infrastructure.
Related papers
- Don't mention it: An approach to assess challenges to using software
mentions for citation and discoverability research [0.3268055538225029]
We present an approach to assess the usability of such datasets for research on research software.
One dataset does not provide links to mentioned software at all, the other does so in a way that can impede quantitative research endeavors.
The greatest challenge and underlying issue in working with software mention datasets is the still suboptimal practice of software citation.
arXiv Detail & Related papers (2024-02-22T14:51:17Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z) - Deep Graph Learning for Anomalous Citation Detection [55.81334139806342]
We propose a novel deep graph learning model, namely GLAD (Graph Learning for Anomaly Detection), to identify anomalies in citation networks.
Within the GLAD framework, we propose an algorithm called CPU (Citation PUrpose) to discover the purpose of citation based on citation texts.
arXiv Detail & Related papers (2022-02-23T09:05:28Z) - Towards generating citation sentences for multiple references with
intent control [86.53829532976303]
We build a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs.
Experiments demonstrate that the proposed approaches provide much more comprehensive features for generating citation sentences.
arXiv Detail & Related papers (2021-12-02T15:32:24Z) - Cross-Lingual Citations in English Papers: A Large-Scale Analysis of
Prevalence, Usage, and Impact [0.0]
We present an analysis of cross-lingual citations based on over one million English papers.
Among our findings are an increasing rate of citations to publications written in Chinese.
To facilitate further research, we make our collected data and source code publicly available.
arXiv Detail & Related papers (2021-11-07T15:34:02Z) - SoMeSci- A 5 Star Open Data Gold Standard Knowledge Graph of Software
Mentions in Scientific Articles [1.335443972283229]
SoMeSci is a knowledge graph of software mentions in scientific articles.
It contains high quality annotations (IRR: $kappa=.82$) of 3756 software mentions in 1367 PubMed Central articles.
arXiv Detail & Related papers (2021-08-20T08:53:03Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - Utilizing Citation Network Structure to Predict Citation Counts: A Deep
Learning Approach [0.0]
This paper proposes an end-to-end deep learning network, DeepCCP, which combines the effect of information cascade and looks at the citation counts prediction problem.
According to experiments on 6 real data sets, DeepCCP is superior to the state-of-the-art methods in terms of the accuracy of citation count prediction.
arXiv Detail & Related papers (2020-09-06T05:27:50Z) - A Decade of In-text Citation Analysis based on Natural Language
Processing and Machine Learning Techniques: An overview of empirical studies [3.474275085556876]
Information scientists have gone far beyond traditional bibliometrics by tapping into advancements in full-text data processing techniques.
This article aims to narratively review the studies on these developments.
Its primary focus is on publications that have used natural language processing and machine learning techniques to analyse citations.
arXiv Detail & Related papers (2020-08-29T17:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.