Citations are not opinions: a corpus linguistics approach to
understanding how citations are made
- URL: http://arxiv.org/abs/2104.08087v1
- Date: Fri, 16 Apr 2021 12:52:27 GMT
- Title: Citations are not opinions: a corpus linguistics approach to
understanding how citations are made
- Authors: Domenic Rosati
- Abstract summary: Key issue in citation content analysis is looking for linguistic structures that characterize distinct classes of citations.
In this study, we start with a large sample of a pre-classified citation corpus, 2 million citations from each class of the scite Smart Citation dataset.
By generating comparison tables for each citation type, we present a number of interesting linguistic features that uniquely characterize citation type.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Citation content analysis seeks to understand citations based on the language
used during the making of a citation. A key issue in citation content analysis
is looking for linguistic structures that characterize distinct classes of
citations for the purposes of understanding the intent and function of a
citation. Previous works have focused on modeling linguistic features first and
drawn conclusions on the language structures unique to each class of citation
function based on the performance of a classification task or inter-annotator
agreement. In this study, we start with a large sample of a pre-classified
citation corpus, 2 million citations from each class of the scite Smart
Citation dataset (supporting, disputing, and mentioning citations), and analyze
its corpus linguistics in order to reveal the unique and statistically
significant language structures belonging to each type of citation. By
generating comparison tables for each citation type we present a number of
interesting linguistic features that uniquely characterize citation type. What
we find is that within citation collocates, there is very low correlation
between citation type and sentiment. Additionally, we find that the
subjectivity of citation collocates across classes is very low. These findings
suggest that the sentiment of collocates is not a predictor of citation
function and that due to their low subjectivity, an opinion-expressing mode of
understanding citations, implicit in previous citation sentiment analysis
literature, is inappropriate. Instead, we suggest that citations can be better
understood as claims-making devices where the citation type can be explained by
understanding how two claims are being compared. By presenting this approach,
we hope to inspire similar corpus linguistic studies on citations that derive a
more robust theory of citation from an empirical basis using citation corpora
Related papers
- ALiiCE: Evaluating Positional Fine-grained Citation Generation [54.19617927314975]
We propose ALiiCE, the first automatic evaluation framework for fine-grained citation generation.
Our framework first parses the sentence claim into atomic claims via dependency analysis and then calculates citation quality at the atomic claim level.
We evaluate the positional fine-grained citation generation performance of several Large Language Models on two long-form QA datasets.
arXiv Detail & Related papers (2024-06-19T09:16:14Z) - Contextualizing Generated Citation Texts [11.531517736126657]
We propose a simple modification to the citation text generation task.
The generation target is not only the citation itself, but the entire context window, including the target citation.
arXiv Detail & Related papers (2024-02-28T05:24:21Z) - Hidden Citations Obscure True Impact in Science [1.5279567721070433]
When a discovery becomes common knowledge, citations suffer from obliteration by incorporation.
Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations.
We show that the prevalence of hidden citations is not driven by citation counts, but by the degree of the discourse on the topic within the text of the manuscripts.
arXiv Detail & Related papers (2023-10-24T20:58:07Z) - CiteCaseLAW: Citation Worthiness Detection in Caselaw for Legal
Assistive Writing [44.75251805925605]
We introduce a labeled dataset of 178M sentences for citation-worthiness detection in the legal domain from the Caselaw Access Project (CAP)
The performance of various deep learning models was examined on this novel dataset.
The domain-specific pre-trained model tends to outperform other models, with an 88% F1-score for the citation-worthiness detection task.
arXiv Detail & Related papers (2023-05-03T04:20:56Z) - Deep Graph Learning for Anomalous Citation Detection [55.81334139806342]
We propose a novel deep graph learning model, namely GLAD (Graph Learning for Anomaly Detection), to identify anomalies in citation networks.
Within the GLAD framework, we propose an algorithm called CPU (Citation PUrpose) to discover the purpose of citation based on citation texts.
arXiv Detail & Related papers (2022-02-23T09:05:28Z) - Towards generating citation sentences for multiple references with
intent control [86.53829532976303]
We build a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs.
Experiments demonstrate that the proposed approaches provide much more comprehensive features for generating citation sentences.
arXiv Detail & Related papers (2021-12-02T15:32:24Z) - Cross-Lingual Citations in English Papers: A Large-Scale Analysis of
Prevalence, Usage, and Impact [0.0]
We present an analysis of cross-lingual citations based on over one million English papers.
Among our findings are an increasing rate of citations to publications written in Chinese.
To facilitate further research, we make our collected data and source code publicly available.
arXiv Detail & Related papers (2021-11-07T15:34:02Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - How are journals cited? characterizing journal citations by type of
citation [0.0]
We present initial results on the statistical characterization of citations to journals based on citation function.
We also present initial results of characterizing the ratio of supports and disputes received by a journal as a potential indicator of quality.
arXiv Detail & Related papers (2021-02-22T14:15:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.