CausalCite: A Causal Formulation of Paper Citations
- URL: http://arxiv.org/abs/2311.02790v3
- Date: Mon, 27 May 2024 20:31:14 GMT
- Title: CausalCite: A Causal Formulation of Paper Citations
- Authors: Ishan Kumar, Zhijing Jin, Ehsan Mokhtarian, Siyuan Guo, Yuen Chen, Mrinmaya Sachan, Bernhard Schölkopf,
- Abstract summary: CausalCite is a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers.
It is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings.
We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts.
- Score: 80.82622421055734
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Citation count of a paper is a commonly used proxy for evaluating the significance of a paper in the scientific community. Yet citation measures are widely criticized for failing to accurately reflect the true impact of a paper. Thus, we propose CausalCite, a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers. CausalCite is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings. TextMatch encodes each paper using text embeddings from large language models (LLMs), extracts similar samples by cosine similarity, and synthesizes a counterfactual sample as the weighted average of similar papers according to their similarity values. We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts on a previous dataset of 1K papers, (test-of-time) awards for past papers, and its stability across various subfields of AI. We also provide a set of findings that can serve as suggested ways for future researchers to use our metric for a better understanding of the quality of a paper. Our code is available at https://github.com/causalNLP/causal-cite.
Related papers
- Decade-long Utilization Patterns of ICSE Technical Papers and Associated Artifacts [0.0]
We collect data on usage attributes from papers and their artifacts, conduct a statistical assessment to identify differences, and analyze the top five papers in each attribute category.
There is a significant difference between paper citations and the usage of associated artifacts.
We provide a thorough overview of ICSE's accepted papers from the last decade, emphasizing the intricate relationship between research papers and their artifacts.
arXiv Detail & Related papers (2024-04-08T19:29:15Z) - Fusion of the Power from Citations: Enhance your Influence by Integrating Information from References [3.607567777043649]
This study aims to formulate the prediction problem to identify whether one paper can increase scholars' influence or not.
By applying the framework in this work, scholars can identify whether their papers can improve their influence in the future.
arXiv Detail & Related papers (2023-10-27T19:51:44Z) - Chain-of-Factors Paper-Reviewer Matching [32.86512592730291]
We propose a unified model for paper-reviewer matching that jointly considers semantic, topic, and citation factors.
We demonstrate the effectiveness of our proposed Chain-of-Factors model in comparison with state-of-the-art paper-reviewer matching methods and scientific pre-trained language models.
arXiv Detail & Related papers (2023-10-23T01:29:18Z) - Estimating the Causal Effect of Early ArXiving on Paper Acceptance [56.538813945721685]
We estimate the effect of arXiving a paper before the reviewing period (early arXiving) on its acceptance to the conference.
Our results suggest that early arXiving may have a small effect on a paper's chances of acceptance.
arXiv Detail & Related papers (2023-06-24T07:45:38Z) - Forgotten Knowledge: Examining the Citational Amnesia in NLP [63.13508571014673]
We show how far back in time do we tend to go to cite papers? How has that changed over time, and what factors correlate with this citational attention/amnesia?
We show that around 62% of cited papers are from the immediate five years prior to publication, whereas only about 17% are more than ten years old.
We show that the median age and age diversity of cited papers were steadily increasing from 1990 to 2014, but since then, the trend has reversed, and current NLP papers have an all-time low temporal citation diversity.
arXiv Detail & Related papers (2023-05-29T18:30:34Z) - CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z) - Predicting Long-Term Citations from Short-Term Linguistic Influence [20.78217545537925]
A standard measure of the influence of a research paper is the number of times it is cited.
We propose a novel method to quantify linguistic influence in timestamped document collections.
arXiv Detail & Related papers (2022-10-24T22:03:26Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.