The Noisy Path from Source to Citation: Measuring How Scholars Engage with Past Research
- URL: http://arxiv.org/abs/2502.20581v2
- Date: Wed, 05 Mar 2025 16:32:35 GMT
- Title: The Noisy Path from Source to Citation: Measuring How Scholars Engage with Past Research
- Authors: Hong Chen, Misha Teplitskiy, David Jurgens,
- Abstract summary: We introduce a computational pipeline to quantify citation fidelity at scale.<n>Using full texts of papers, the pipeline identifies citations in citing papers and the corresponding claims in cited papers.<n>Using a quasi-experiment, we establish the "telephone effect" - when citing papers have low fidelity to the original claim, future papers that cite the citing paper and the original have lower fidelity to the original.
- Score: 20.649638393774048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Academic citations are widely used for evaluating research and tracing knowledge flows. Such uses typically rely on raw citation counts and neglect variability in citation types. In particular, citations can vary in their fidelity as original knowledge from cited studies may be paraphrased, summarized, or reinterpreted, possibly wrongly, leading to variation in how much information changes from cited to citing paper. In this study, we introduce a computational pipeline to quantify citation fidelity at scale. Using full texts of papers, the pipeline identifies citations in citing papers and the corresponding claims in cited papers, and applies supervised models to measure fidelity at the sentence level. Analyzing a large-scale multi-disciplinary dataset of approximately 13 million citation sentence pairs, we find that citation fidelity is higher when authors cite papers that are 1) more recent and intellectually close, 2) more accessible, and 3) the first author has a lower H-index and the author team is medium-sized. Using a quasi-experiment, we establish the "telephone effect" - when citing papers have low fidelity to the original claim, future papers that cite the citing paper and the original have lower fidelity to the original. Our work reveals systematic differences in citation fidelity, underscoring the limitations of analyses that rely on citation quantity alone and the potential for distortion of evidence.
Related papers
- HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation Prediction [14.731720495144112]
We introduce the novel concept of core citation, which identifies the critical references that go beyond superficial mentions.
We propose $textbfHLM-Cite, a $textbfH$ybrid $textbfL$anguage $textbfM$odel workflow for citation prediction.
We evaluate HLM-Cite across 19 scientific fields, demonstrating a 17.6% performance improvement comparing SOTA methods.
arXiv Detail & Related papers (2024-10-10T10:46:06Z) - ALiiCE: Evaluating Positional Fine-grained Citation Generation [54.19617927314975]
We propose ALiiCE, the first automatic evaluation framework for fine-grained citation generation.
Our framework first parses the sentence claim into atomic claims via dependency analysis and then calculates citation quality at the atomic claim level.
We evaluate the positional fine-grained citation generation performance of several Large Language Models on two long-form QA datasets.
arXiv Detail & Related papers (2024-06-19T09:16:14Z) - CausalCite: A Causal Formulation of Paper Citations [80.82622421055734]
CausalCite is a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers.
It is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings.
We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts.
arXiv Detail & Related papers (2023-11-05T23:09:39Z) - Forgotten Knowledge: Examining the Citational Amnesia in NLP [63.13508571014673]
We show how far back in time do we tend to go to cite papers? How has that changed over time, and what factors correlate with this citational attention/amnesia?
We show that around 62% of cited papers are from the immediate five years prior to publication, whereas only about 17% are more than ten years old.
We show that the median age and age diversity of cited papers were steadily increasing from 1990 to 2014, but since then, the trend has reversed, and current NLP papers have an all-time low temporal citation diversity.
arXiv Detail & Related papers (2023-05-29T18:30:34Z) - Deep Graph Learning for Anomalous Citation Detection [55.81334139806342]
We propose a novel deep graph learning model, namely GLAD (Graph Learning for Anomaly Detection), to identify anomalies in citation networks.
Within the GLAD framework, we propose an algorithm called CPU (Citation PUrpose) to discover the purpose of citation based on citation texts.
arXiv Detail & Related papers (2022-02-23T09:05:28Z) - Towards generating citation sentences for multiple references with
intent control [86.53829532976303]
We build a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs.
Experiments demonstrate that the proposed approaches provide much more comprehensive features for generating citation sentences.
arXiv Detail & Related papers (2021-12-02T15:32:24Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - How are journals cited? characterizing journal citations by type of
citation [0.0]
We present initial results on the statistical characterization of citations to journals based on citation function.
We also present initial results of characterizing the ratio of supports and disputes received by a journal as a potential indicator of quality.
arXiv Detail & Related papers (2021-02-22T14:15:50Z) - Virtual Proximity Citation (VCP): A Supervised Deep Learning Method to
Relate Uncited Papers On Grounds of Citation Proximity [0.0]
This paper discusses the approach Virtual Citation Proximity (VCP)
The actual distance between the two citations in a document is used as ground truth.
This can be used to calculate relatedness between two documents in a way they would have been cited in the proximity even if the documents are uncited.
arXiv Detail & Related papers (2020-09-25T12:24:00Z) - Examining Citations of Natural Language Processing Literature [31.87319293259599]
We show that only about 56% of the papers in AA are cited ten or more times.
CL Journal has the most cited papers, but its citation dominance has lessened in recent years.
papers on sentiment classification, anaphora resolution, and entity recognition have the highest median citations.
arXiv Detail & Related papers (2020-05-02T20:01:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.