Predicting Long-Term Citations from Short-Term Linguistic Influence
- URL: http://arxiv.org/abs/2210.13628v1
- Date: Mon, 24 Oct 2022 22:03:26 GMT
- Title: Predicting Long-Term Citations from Short-Term Linguistic Influence
- Authors: Sandeep Soni and David Bamman and Jacob Eisenstein
- Abstract summary: A standard measure of the influence of a research paper is the number of times it is cited.
We propose a novel method to quantify linguistic influence in timestamped document collections.
- Score: 20.78217545537925
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A standard measure of the influence of a research paper is the number of
times it is cited. However, papers may be cited for many reasons, and citation
count offers limited information about the extent to which a paper affected the
content of subsequent publications. We therefore propose a novel method to
quantify linguistic influence in timestamped document collections. There are
two main steps: first, identify lexical and semantic changes using contextual
embeddings and word frequencies; second, aggregate information about these
changes into per-document influence scores by estimating a high-dimensional
Hawkes process with a low-rank parameter matrix. We show that this measure of
linguistic influence is predictive of $\textit{future}$ citations: the estimate
of linguistic influence from the two years after a paper's publication is
correlated with and predictive of its citation count in the following three
years. This is demonstrated using an online evaluation with incremental
temporal training/test splits, in comparison with a strong baseline that
includes predictors for initial citation counts, topics, and lexical features.
Related papers
- CiMaTe: Citation Count Prediction Effectively Leveraging the Main Text [14.279848166377667]
Main text is an important factor for citation count prediction, but it is difficult to handle in machine learning models because the main text is typically very long.
We propose a BERT-based citation count prediction model, called CiMaTe, that leverages the main text by explicitly capturing a paper's sectional structure.
arXiv Detail & Related papers (2024-10-06T08:39:13Z) - ALiiCE: Evaluating Positional Fine-grained Citation Generation [54.19617927314975]
We propose ALiiCE, the first automatic evaluation framework for fine-grained citation generation.
Our framework first parses the sentence claim into atomic claims via dependency analysis and then calculates citation quality at the atomic claim level.
We evaluate the positional fine-grained citation generation performance of several Large Language Models on two long-form QA datasets.
arXiv Detail & Related papers (2024-06-19T09:16:14Z) - CausalCite: A Causal Formulation of Paper Citations [80.82622421055734]
CausalCite is a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers.
It is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings.
We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts.
arXiv Detail & Related papers (2023-11-05T23:09:39Z) - Estimating the Causal Effect of Early ArXiving on Paper Acceptance [56.538813945721685]
We estimate the effect of arXiving a paper before the reviewing period (early arXiving) on its acceptance to the conference.
Our results suggest that early arXiving may have a small effect on a paper's chances of acceptance.
arXiv Detail & Related papers (2023-06-24T07:45:38Z) - CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z) - Towards generating citation sentences for multiple references with
intent control [86.53829532976303]
We build a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs.
Experiments demonstrate that the proposed approaches provide much more comprehensive features for generating citation sentences.
arXiv Detail & Related papers (2021-12-02T15:32:24Z) - Cross-Lingual Citations in English Papers: A Large-Scale Analysis of
Prevalence, Usage, and Impact [0.0]
We present an analysis of cross-lingual citations based on over one million English papers.
Among our findings are an increasing rate of citations to publications written in Chinese.
To facilitate further research, we make our collected data and source code publicly available.
arXiv Detail & Related papers (2021-11-07T15:34:02Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - How are journals cited? characterizing journal citations by type of
citation [0.0]
We present initial results on the statistical characterization of citations to journals based on citation function.
We also present initial results of characterizing the ratio of supports and disputes received by a journal as a potential indicator of quality.
arXiv Detail & Related papers (2021-02-22T14:15:50Z) - Longitudinal Citation Prediction using Temporal Graph Neural Networks [27.589741169713825]
We introduce the task of sequence citation prediction.
The goal is to accurately predict the trajectory of the number of citations a scholarly work receives over time.
arXiv Detail & Related papers (2020-12-10T15:25:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.