Hidden Citations Obscure True Impact in Science
- URL: http://arxiv.org/abs/2310.16181v2
- Date: Sat, 11 May 2024 19:51:45 GMT
- Title: Hidden Citations Obscure True Impact in Science
- Authors: Xiangyi Meng, Onur Varol, Albert-László Barabási,
- Abstract summary: When a discovery becomes common knowledge, citations suffer from obliteration by incorporation.
Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations.
We show that the prevalence of hidden citations is not driven by citation counts, but by the degree of the discourse on the topic within the text of the manuscripts.
- Score: 1.5279567721070433
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying it. Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations. We find that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline. We show that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus.
Related papers
- ALiiCE: Evaluating Positional Fine-grained Citation Generation [54.19617927314975]
We propose ALiiCE, the first automatic evaluation framework for fine-grained citation generation.
Our framework first parses the sentence claim into atomic claims via dependency analysis and then calculates citation quality at the atomic claim level.
We evaluate the positional fine-grained citation generation performance of several Large Language Models on two long-form QA datasets.
arXiv Detail & Related papers (2024-06-19T09:16:14Z) - Uncited articles and their effect on the concentration of citations [0.0]
Empirical evidence shows that citations received by scholarly publications follow a pattern of preferential attachment, resulting in a power-law distribution.
Are citations becoming more concentrated in a small number of articles? Or have recent geopolitical and technical changes in science led to more decentralized distributions?
This article explores how reference-based and citation-based approaches, uncited articles, citation inflation, the expansion of bibliometric databases, disciplinary differences, and self-citations affect the evolution of citation concentration.
arXiv Detail & Related papers (2023-06-16T15:38:12Z) - Detecting and analyzing missing citations to published scientific
entities [5.811229506383401]
We design a special method Citation Recommendation for Published Scientific Entity (CRPSE) based on the cooccurrences between published scientific entities and in-text citations.
We conduct a statistical analysis on missing citations among papers published in prestigious computer science conferences in 2020.
On a median basis, the papers proposing these published scientific entities with missing citations were published 8 years ago.
arXiv Detail & Related papers (2022-10-18T18:08:20Z) - Deep Graph Learning for Anomalous Citation Detection [55.81334139806342]
We propose a novel deep graph learning model, namely GLAD (Graph Learning for Anomaly Detection), to identify anomalies in citation networks.
Within the GLAD framework, we propose an algorithm called CPU (Citation PUrpose) to discover the purpose of citation based on citation texts.
arXiv Detail & Related papers (2022-02-23T09:05:28Z) - Towards generating citation sentences for multiple references with
intent control [86.53829532976303]
We build a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs.
Experiments demonstrate that the proposed approaches provide much more comprehensive features for generating citation sentences.
arXiv Detail & Related papers (2021-12-02T15:32:24Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Citations are not opinions: a corpus linguistics approach to
understanding how citations are made [0.0]
Key issue in citation content analysis is looking for linguistic structures that characterize distinct classes of citations.
In this study, we start with a large sample of a pre-classified citation corpus, 2 million citations from each class of the scite Smart Citation dataset.
By generating comparison tables for each citation type, we present a number of interesting linguistic features that uniquely characterize citation type.
arXiv Detail & Related papers (2021-04-16T12:52:27Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.