Cross-Lingual Citations in English Papers: A Large-Scale Analysis of
Prevalence, Usage, and Impact
- URL: http://arxiv.org/abs/2111.05097v2
- Date: Wed, 10 Nov 2021 08:48:05 GMT
- Title: Cross-Lingual Citations in English Papers: A Large-Scale Analysis of
Prevalence, Usage, and Impact
- Authors: Tarek Saier, Michael F\"arber, Tornike Tsereteli
- Abstract summary: We present an analysis of cross-lingual citations based on over one million English papers.
Among our findings are an increasing rate of citations to publications written in Chinese.
To facilitate further research, we make our collected data and source code publicly available.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Citation information in scholarly data is an important source of insight into
the reception of publications and the scholarly discourse. Outcomes of citation
analyses and the applicability of citation based machine learning approaches
heavily depend on the completeness of such data. One particular shortcoming of
scholarly data nowadays is that non-English publications are often not included
in data sets, or that language metadata is not available. Because of this,
citations between publications of differing languages (cross-lingual citations)
have only been studied to a very limited degree. In this paper, we present an
analysis of cross-lingual citations based on over one million English papers,
spanning three scientific disciplines and a time span of three decades. Our
investigation covers differences between cited languages and disciplines,
trends over time, and the usage characteristics as well as impact of
cross-lingual citations. Among our findings are an increasing rate of citations
to publications written in Chinese, citations being primarily to local
non-English languages, and consistency in citation intent between cross- and
monolingual citations. To facilitate further research, we make our collected
data and source code publicly available.
Related papers
- Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia [49.80565462746646]
We introduce the InfoGap method -- an efficient and reliable approach to locating information gaps and inconsistencies in articles at the fact level.
We evaluate InfoGap by analyzing LGBT people's portrayals, across 2.7K biography pages on English, Russian, and French Wikipedias.
arXiv Detail & Related papers (2024-10-05T20:40:49Z) - Context-Enhanced Language Models for Generating Multi-Paper Citations [35.80247519023821]
We propose a method that leverages Large Language Models (LLMs) to generate multi-citation sentences.
Our approach involves a single source paper and a collection of target papers, culminating in a coherent paragraph containing multi-sentence citation text.
arXiv Detail & Related papers (2024-04-22T04:30:36Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Lost in Translation -- Multilingual Misinformation and its Evolution [52.07628580627591]
This paper investigates the prevalence and dynamics of multilingual misinformation through an analysis of over 250,000 unique fact-checks spanning 95 languages.
We find that while the majority of misinformation claims are only fact-checked once, 11.7%, corresponding to more than 21,000 claims, are checked multiple times.
Using fact-checks as a proxy for the spread of misinformation, we find 33% of repeated claims cross linguistic boundaries.
arXiv Detail & Related papers (2023-10-27T12:21:55Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Predicting Long-Term Citations from Short-Term Linguistic Influence [20.78217545537925]
A standard measure of the influence of a research paper is the number of times it is cited.
We propose a novel method to quantify linguistic influence in timestamped document collections.
arXiv Detail & Related papers (2022-10-24T22:03:26Z) - Deep Graph Learning for Anomalous Citation Detection [55.81334139806342]
We propose a novel deep graph learning model, namely GLAD (Graph Learning for Anomaly Detection), to identify anomalies in citation networks.
Within the GLAD framework, we propose an algorithm called CPU (Citation PUrpose) to discover the purpose of citation based on citation texts.
arXiv Detail & Related papers (2022-02-23T09:05:28Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - How are journals cited? characterizing journal citations by type of
citation [0.0]
We present initial results on the statistical characterization of citations to journals based on citation function.
We also present initial results of characterizing the ratio of supports and disputes received by a journal as a potential indicator of quality.
arXiv Detail & Related papers (2021-02-22T14:15:50Z) - Characterizing References from Different Disciplines: A Perspective of
Citation Content Analysis [7.171503036026183]
This work takes articles in PLoS as the data and characterizes the references from different disciplines based on Citation Content Analysis (CCA)
Although most references come from Natural Science, Humanities and Social Sciences play important roles in the Introduction and Background sections of the articles.
arXiv Detail & Related papers (2021-01-19T13:30:00Z) - A Decade of In-text Citation Analysis based on Natural Language
Processing and Machine Learning Techniques: An overview of empirical studies [3.474275085556876]
Information scientists have gone far beyond traditional bibliometrics by tapping into advancements in full-text data processing techniques.
This article aims to narratively review the studies on these developments.
Its primary focus is on publications that have used natural language processing and machine learning techniques to analyse citations.
arXiv Detail & Related papers (2020-08-29T17:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.