Related papers: CausalCite: A Causal Formulation of Paper Citations

CausalCite: A Causal Formulation of Paper Citations

URL: http://arxiv.org/abs/2311.02790v3
Date: Mon, 27 May 2024 20:31:14 GMT
Title: CausalCite: A Causal Formulation of Paper Citations
Authors: Ishan Kumar, Zhijing Jin, Ehsan Mokhtarian, Siyuan Guo, Yuen Chen, Mrinmaya Sachan, Bernhard Schölkopf,
Abstract summary: CausalCite is a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers. It is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings. We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts.
Score: 80.82622421055734
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Citation count of a paper is a commonly used proxy for evaluating the significance of a paper in the scientific community. Yet citation measures are widely criticized for failing to accurately reflect the true impact of a paper. Thus, we propose CausalCite, a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers. CausalCite is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings. TextMatch encodes each paper using text embeddings from large language models (LLMs), extracts similar samples by cosine similarity, and synthesizes a counterfactual sample as the weighted average of similar papers according to their similarity values. We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts on a previous dataset of 1K papers, (test-of-time) awards for past papers, and its stability across various subfields of AI. We also provide a set of findings that can serve as suggested ways for future researchers to use our metric for a better understanding of the quality of a paper. Our code is available at https://github.com/causalNLP/causal-cite.

Related papers

The Noisy Path from Source to Citation: Measuring How Scholars Engage with Past Research [20.649638393774048]
We introduce a computational pipeline to quantify citation fidelity at scale. Using full texts of papers, the pipeline identifies citations in citing papers and the corresponding claims in cited papers. Using a quasi-experiment, we establish the "telephone effect" - when citing papers have low fidelity to the original claim, future papers that cite the citing paper and the original have lower fidelity to the original.
arXiv Detail & Related papers (2025-02-27T22:47:03Z)
Decade-long Utilization Patterns of ICSE Technical Papers and Associated Artifacts [0.0]
We collect data on usage attributes from papers and their artifacts, conduct a statistical assessment to identify differences, and analyze the top five papers in each attribute category. There is a significant difference between paper citations and the usage of associated artifacts. We provide a thorough overview of ICSE's accepted papers from the last decade, emphasizing the intricate relationship between research papers and their artifacts.
arXiv Detail & Related papers (2024-04-08T19:29:15Z)
Fusion of the Power from Citations: Enhance your Influence by Integrating Information from References [3.607567777043649]
This study aims to formulate the prediction problem to identify whether one paper can increase scholars' influence or not. By applying the framework in this work, scholars can identify whether their papers can improve their influence in the future.
arXiv Detail & Related papers (2023-10-27T19:51:44Z)
Chain-of-Factors Paper-Reviewer Matching [32.86512592730291]
We propose a unified model for paper-reviewer matching that jointly considers semantic, topic, and citation factors. We demonstrate the effectiveness of our proposed Chain-of-Factors model in comparison with state-of-the-art paper-reviewer matching methods and scientific pre-trained language models.
arXiv Detail & Related papers (2023-10-23T01:29:18Z)
Estimating the Causal Effect of Early ArXiving on Paper Acceptance [56.538813945721685]
We estimate the effect of arXiving a paper before the reviewing period (early arXiving) on its acceptance to the conference. Our results suggest that early arXiving may have a small effect on a paper's chances of acceptance.
arXiv Detail & Related papers (2023-06-24T07:45:38Z)
Forgotten Knowledge: Examining the Citational Amnesia in NLP [63.13508571014673]
We show how far back in time do we tend to go to cite papers? How has that changed over time, and what factors correlate with this citational attention/amnesia? We show that around 62% of cited papers are from the immediate five years prior to publication, whereas only about 17% are more than ten years old. We show that the median age and age diversity of cited papers were steadily increasing from 1990 to 2014, but since then, the trend has reversed, and current NLP papers have an all-time low temporal citation diversity.
arXiv Detail & Related papers (2023-05-29T18:30:34Z)
CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation. We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z)
Predicting Long-Term Citations from Short-Term Linguistic Influence [20.78217545537925]
A standard measure of the influence of a research paper is the number of times it is cited. We propose a novel method to quantify linguistic influence in timestamped document collections.
arXiv Detail & Related papers (2022-10-24T22:03:26Z)
Semantic Analysis for Automated Evaluation of the Potential Impact of Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory. We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus. We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z)
Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph. We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains. Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.