Density, asymmetry and citation dynamics in scientific literature
- URL: http://arxiv.org/abs/2506.23366v1
- Date: Sun, 29 Jun 2025 18:55:04 GMT
- Title: Density, asymmetry and citation dynamics in scientific literature
- Authors: Nathaniel Imel, Zachary Hafen,
- Abstract summary: We introduce two complementary metrics to characterize the local geometry of a publication's semantic neighborhood.<n>We tested the predictive relationship between these two metrics and its subsequent citation rate.<n>While the individual effects of $$rho on citation count are small and variable, incorporating density-based predictors consistently improves out-of-sample prediction.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific behavior is often characterized by a tension between building upon established knowledge and introducing novel ideas. Here, we investigate whether this tension is reflected in the relationship between the similarity of a scientific paper to previous research and its eventual citation rate. To operationalize similarity to previous research, we introduce two complementary metrics to characterize the local geometry of a publication's semantic neighborhood: (1) \emph{density} ($\rho$), defined as the ratio between a fixed number of previously-published papers and the minimum distance enclosing those papers in a semantic embedding space, and (2) asymmetry ($\alpha$), defined as the average directional difference between a paper and its nearest neighbors. We tested the predictive relationship between these two metrics and its subsequent citation rate using a Bayesian hierarchical regression approach, surveying $\sim 53,000$ publications across nine academic disciplines and five different document embeddings. While the individual effects of $\rho$ on citation count are small and variable, incorporating density-based predictors consistently improves out-of-sample prediction when added to baseline models. These results suggest that the density of a paper's surrounding scientific literature may carry modest but informative signals about its eventual impact. Meanwhile, we find no evidence that publication asymmetry improves model predictions of citation rates. Our work provides a scalable framework for linking document embeddings to scientometric outcomes and highlights new questions regarding the role that semantic similarity plays in shaping the dynamics of scientific reward.
Related papers
- In-depth Research Impact Summarization through Fine-Grained Temporal Citation Analysis [52.42612945266194]
We propose a new task: generating nuanced, expressive, and time-aware impact summaries.<n>We show that these summaries capture both praise (confirmation citations) and critique (correction citations) through the evolution of fine-grained citation intents.
arXiv Detail & Related papers (2025-05-20T19:11:06Z) - The Noisy Path from Source to Citation: Measuring How Scholars Engage with Past Research [20.649638393774048]
We introduce a computational pipeline to quantify citation fidelity at scale.<n>Using full texts of papers, the pipeline identifies citations in citing papers and the corresponding claims in cited papers.<n>Using a quasi-experiment, we establish the "telephone effect" - when citing papers have low fidelity to the original claim, future papers that cite the citing paper and the original have lower fidelity to the original.
arXiv Detail & Related papers (2025-02-27T22:47:03Z) - HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation Prediction [14.731720495144112]
We introduce the novel concept of core citation, which identifies the critical references that go beyond superficial mentions.
We propose $textbfHLM-Cite, a $textbfH$ybrid $textbfL$anguage $textbfM$odel workflow for citation prediction.
We evaluate HLM-Cite across 19 scientific fields, demonstrating a 17.6% performance improvement comparing SOTA methods.
arXiv Detail & Related papers (2024-10-10T10:46:06Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Time to Cite: Modeling Citation Networks using the Dynamic Impact
Single-Event Embedding Model [0.33123773366516646]
citation networks are a prominent example of single-event dynamic networks.
We propose a novel likelihood function for the characterization of such single-event networks.
The Dynamic Impact Single-Event Embedding model (DISEE) reconciles static latent distance network embedding approaches with classical dynamic impact assessments.
arXiv Detail & Related papers (2024-02-28T22:59:26Z) - Predicting Scientific Impact Through Diffusion, Conformity, and Contribution Disentanglement [11.684776349325887]
Existing models typically rely on static graphs for citation count estimation.
We introduce a novel model, DPPDCC, which Disentangles the Potential impacts of Papers into Diffusion, Conformity, and Contribution values.
arXiv Detail & Related papers (2023-11-15T07:21:11Z) - CausalCite: A Causal Formulation of Paper Citations [80.82622421055734]
CausalCite is a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers.
It is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings.
We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts.
arXiv Detail & Related papers (2023-11-05T23:09:39Z) - Scientometric engineering: Exploring citation dynamics via arXiv eprints [0.0]
We investigate the citation data of more than 1.5 million eprints on arXiv.
We find that the typical growth and obsolescence patterns vary across disciplines.
We derive a model consistent with the observed quantitative and temporal characteristics of citation growth and obsolescence.
arXiv Detail & Related papers (2021-06-09T12:38:44Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.