Decoding Knowledge Claims: The Evaluation of Scientific Publication Contributions through Semantic Analysis
- URL: http://arxiv.org/abs/2407.18646v2
- Date: Sat, 31 Aug 2024 19:26:21 GMT
- Title: Decoding Knowledge Claims: The Evaluation of Scientific Publication Contributions through Semantic Analysis
- Authors: Luca D'Aniello, Nicolas Robinson-Garcia, Massimo Aria, Corrado Cuccurullo,
- Abstract summary: This paper proposes the use of Relaxed Word Mover's Distance (RWMD), a semantic text similarity measure, to evaluate the novelty of scientific papers.
We compare RWMD results across three groups: 1) H-Index-related papers, 2) scientometric studies, and 3) unrelated papers, aiming to discern redundant literature and hype from genuine innovations.
- Score: 0.3374875022248865
- License:
- Abstract: The surge in scientific publications challenges the use of publication counts as a measure of scientific progress, requiring alternative metrics that emphasize the quality and novelty of scientific contributions rather than sheer quantity. This paper proposes the use of Relaxed Word Mover's Distance (RWMD), a semantic text similarity measure, to evaluate the novelty of scientific papers. We hypothesize that RWMD can more effectively gauge the growth of scientific knowledge. To test such an assumption, we apply RWMD to evaluate seminal papers, with Hirsch's H-Index paper as a primary case study. We compare RWMD results across three groups: 1) H-Index-related papers, 2) scientometric studies, and 3) unrelated papers, aiming to discern redundant literature and hype from genuine innovations. Findings suggest that emphasizing knowledge claims offers a deeper insight into scientific contributions, marking RWMD as a promising alternative method to traditional citation metrics, thus better tracking significant scientific breakthroughs.
Related papers
- SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions [52.35520385083425]
We present SciDMT, an enhanced and expanded corpus for scientific mention detection.
The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes.
arXiv Detail & Related papers (2024-06-20T22:03:21Z) - PaperQA: Retrieval-Augmented Generative Agent for Scientific Research [41.9628176602676]
We present PaperQA, a RAG agent for answering questions over the scientific literature.
PaperQA is an agent that performs information retrieval across full-text scientific articles, assesses the relevance of sources and passages, and uses RAG to provide answers.
We also introduce LitQA, a more complex benchmark that requires retrieval and synthesis of information from full-text scientific papers across the literature.
arXiv Detail & Related papers (2023-12-08T18:50:20Z) - Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences [3.9985385067438344]
A strong hypothesis is a best guess based on existing evidence and informed by a comprehensive view of relevant literature.
With exponential increase in the number of scientific articles published annually, manual aggregation and synthesis of evidence related to a given hypothesis is a challenge.
We share a novel dataset for the task of scientific hypothesis evidencing using community-driven annotations of studies in the social sciences.
arXiv Detail & Related papers (2023-09-07T04:15:17Z) - MIReAD: Simple Method for Learning High-quality Representations from
Scientific Documents [77.34726150561087]
We propose MIReAD, a simple method that learns high-quality representations of scientific papers.
We train MIReAD on more than 500,000 PubMed and arXiv abstracts across over 2,000 journal classes.
arXiv Detail & Related papers (2023-05-07T03:29:55Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - Detecting and analyzing missing citations to published scientific
entities [5.811229506383401]
We design a special method Citation Recommendation for Published Scientific Entity (CRPSE) based on the cooccurrences between published scientific entities and in-text citations.
We conduct a statistical analysis on missing citations among papers published in prestigious computer science conferences in 2020.
On a median basis, the papers proposing these published scientific entities with missing citations were published 8 years ago.
arXiv Detail & Related papers (2022-10-18T18:08:20Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - Measuring prominence of scientific work in online news as a proxy for
impact [15.772621977756058]
We present a new corpus of newspaper articles linked to the scientific papers that they describe.
We find that Impact Case studies submitted to the UK Research Excellence Framework (REF) 2014 that refer to scientific papers mentioned in newspaper articles were awarded a higher score.
This supports our hypothesis that linguistic prominence in news can be used to suggest the wider non-academic impact of scientific work.
arXiv Detail & Related papers (2020-07-28T19:52:21Z) - Attention: to Better Stand on the Shoulders of Giants [34.5017808610466]
This paper develops an attention mechanism for the long-term scientific impact prediction.
It validates the method based on a real large-scale citation data set.
arXiv Detail & Related papers (2020-05-27T00:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.