How many preprints have actually been printed and why: a case study of
computer science preprints on arXiv
- URL: http://arxiv.org/abs/2308.01899v1
- Date: Thu, 3 Aug 2023 17:56:16 GMT
- Title: How many preprints have actually been printed and why: a case study of
computer science preprints on arXiv
- Authors: Jialiang Lin, Yao Yu, Yu Zhou, Zhiyang Zhou, Xiaodong Shi
- Abstract summary: We quantify how many preprints have eventually been printed in peer-reviewed venues.
Among those published manuscripts, some are published under different titles and without an update to their preprints on arXiv.
In the field of computer science, published preprints feature adequate revisions, multiple authorship, detailed abstract and introduction, extensive and authoritative references and available source code.
- Score: 9.783989953810725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Preprints play an increasingly critical role in academic communities. There
are many reasons driving researchers to post their manuscripts to preprint
servers before formal submission to journals or conferences, but the use of
preprints has also sparked considerable controversy, especially surrounding the
claim of priority. In this paper, a case study of computer science preprints
submitted to arXiv from 2008 to 2017 is conducted to quantify how many
preprints have eventually been printed in peer-reviewed venues. Among those
published manuscripts, some are published under different titles and without an
update to their preprints on arXiv. In the case of these manuscripts, the
traditional fuzzy matching method is incapable of mapping the preprint to the
final published version. In view of this issue, we introduce a semantics-based
mapping method with the employment of Bidirectional Encoder Representations
from Transformers (BERT). With this new mapping method and a plurality of data
sources, we find that 66% of all sampled preprints are published under
unchanged titles and 11% are published under different titles and with other
modifications. A further analysis was then performed to investigate why these
preprints but not others were accepted for publication. Our comparison reveals
that in the field of computer science, published preprints feature adequate
revisions, multiple authorship, detailed abstract and introduction, extensive
and authoritative references and available source code.
Related papers
- Toward Reproducibility of Digital Twin Research: Exemplified with the PiCar-X [49.44419860570116]
Digital twins are increasingly relevant in the Industrial Internet of Things and Industry 4.0.
The concept of dts lacks a unified definition and faces validation challenges.
This paper presents a reproducible laboratory experiment that demonstrates various dt concepts.
arXiv Detail & Related papers (2024-08-25T15:34:00Z) - Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals.
Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z) - CausalCite: A Causal Formulation of Paper Citations [80.82622421055734]
CausalCite is a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers.
It is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings.
We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts.
arXiv Detail & Related papers (2023-11-05T23:09:39Z) - Estimating the Causal Effect of Early ArXiving on Paper Acceptance [56.538813945721685]
We estimate the effect of arXiving a paper before the reviewing period (early arXiving) on its acceptance to the conference.
Our results suggest that early arXiving may have a small effect on a paper's chances of acceptance.
arXiv Detail & Related papers (2023-06-24T07:45:38Z) - Contrastive Attention Networks for Attribution of Early Modern Print [23.344655278038392]
We develop machine learning techniques to identify unknown printers in early modern (c.1500--1800) English printed books.
Specifically, we focus on matching uniquely damaged character type-imprints in anonymously printed books to works with known printers.
arXiv Detail & Related papers (2023-06-12T19:57:11Z) - Cracking Double-Blind Review: Authorship Attribution with Deep Learning [43.483063713471935]
We propose a transformer-based, neural-network architecture to attribute an anonymous manuscript to an author.
We leverage all research papers publicly available on arXiv amounting to over 2 million manuscripts.
Our method achieves an unprecedented authorship attribution accuracy, where up to 73% of papers are attributed correctly.
arXiv Detail & Related papers (2022-11-14T15:50:24Z) - Scientometric engineering: Exploring citation dynamics via arXiv eprints [0.0]
We investigate the citation data of more than 1.5 million eprints on arXiv.
We find that the typical growth and obsolescence patterns vary across disciplines.
We derive a model consistent with the observed quantitative and temporal characteristics of citation growth and obsolescence.
arXiv Detail & Related papers (2021-06-09T12:38:44Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - Is preprint the future of science? A thirty year journey of online
preprint services [7.063908865620109]
Preprint is a version of a scientific paper that is publicly distributed preceding formal peer review.
Since the launch of arXiv in 1991, preprints have been increasingly distributed over the Internet as opposed to paper copies.
arXiv Detail & Related papers (2021-02-17T23:08:01Z) - Preprints as accelerator of scholarly communication: An empirical
analysis in Mathematics [9.899221738408581]
We measure two effects associated with preprint publishing: publication delay and impact.
Article with preprint versions are more likely to be mentioned in social media and have shorter Altmetric attention delay.
arXiv Detail & Related papers (2020-11-24T07:32:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.