Detecting and analyzing missing citations to published scientific
entities
- URL: http://arxiv.org/abs/2210.10073v1
- Date: Tue, 18 Oct 2022 18:08:20 GMT
- Title: Detecting and analyzing missing citations to published scientific
entities
- Authors: Jialiang Lin, Yao Yu, Jiaxin Song, Xiaodong Shi
- Abstract summary: We design a special method Citation Recommendation for Published Scientific Entity (CRPSE) based on the cooccurrences between published scientific entities and in-text citations.
We conduct a statistical analysis on missing citations among papers published in prestigious computer science conferences in 2020.
On a median basis, the papers proposing these published scientific entities with missing citations were published 8 years ago.
- Score: 5.811229506383401
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proper citation is of great importance in academic writing for it enables
knowledge accumulation and maintains academic integrity. However, citing
properly is not an easy task. For published scientific entities, the
ever-growing academic publications and over-familiarity of terms easily lead to
missing citations. To deal with this situation, we design a special method
Citation Recommendation for Published Scientific Entity (CRPSE) based on the
cooccurrences between published scientific entities and in-text citations in
the same sentences from previous researchers. Experimental outcomes show the
effectiveness of our method in recommending the source papers for published
scientific entities. We further conduct a statistical analysis on missing
citations among papers published in prestigious computer science conferences in
2020. In the 12,278 papers collected, 475 published scientific entities of
computer science and mathematics are found to have missing citations. Many
entities mentioned without citations are found to be well-accepted research
results. On a median basis, the papers proposing these published scientific
entities with missing citations were published 8 years ago, which can be
considered the time frame for a published scientific entity to develop into a
well-accepted concept. For published scientific entities, we appeal for
accurate and full citation of their source papers as required by academic
standards.
Related papers
- Decoding Knowledge Claims: The Evaluation of Scientific Publication Contributions through Semantic Analysis [0.3374875022248865]
This paper proposes the use of Relaxed Word Mover's Distance (RWMD), a semantic text similarity measure, to evaluate the novelty of scientific papers.
We compare RWMD results across three groups: 1) H-Index-related papers, 2) scientometric studies, and 3) unrelated papers, aiming to discern redundant literature and hype from genuine innovations.
arXiv Detail & Related papers (2024-07-26T10:28:59Z) - Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals.
Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z) - Hidden Citations Obscure True Impact in Science [1.5279567721070433]
When a discovery becomes common knowledge, citations suffer from obliteration by incorporation.
Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations.
We show that the prevalence of hidden citations is not driven by citation counts, but by the degree of the discourse on the topic within the text of the manuscripts.
arXiv Detail & Related papers (2023-10-24T20:58:07Z) - ChatGPT cites the most-cited articles and journals, relying solely on
Google Scholar's citation counts. As a result, AI may amplify the Matthew
Effect in environmental science [0.0]
ChatGPT tends to cite highly-cited publications in environmental science.
Google Scholar citations play a significant role as a predictor for mentioning a study in GPT-generated content.
arXiv Detail & Related papers (2023-04-13T19:29:49Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - A Measure of Research Taste [91.3755431537592]
We present a citation-based measure that rewards both productivity and taste.
The presented measure, CAP, balances the impact of publications and their quantity.
We analyze the characteristics of CAP for highly-cited researchers in biology, computer science, economics, and physics.
arXiv Detail & Related papers (2021-05-17T18:01:47Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - Utilizing Citation Network Structure to Predict Citation Counts: A Deep
Learning Approach [0.0]
This paper proposes an end-to-end deep learning network, DeepCCP, which combines the effect of information cascade and looks at the citation counts prediction problem.
According to experiments on 6 real data sets, DeepCCP is superior to the state-of-the-art methods in terms of the accuracy of citation count prediction.
arXiv Detail & Related papers (2020-09-06T05:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.