SciEv: Finding Scientific Evidence Papers for Scientific News
- URL: http://arxiv.org/abs/2205.00126v1
- Date: Sat, 30 Apr 2022 01:43:23 GMT
- Title: SciEv: Finding Scientific Evidence Papers for Scientific News
- Authors: Md Reshad Ul Hoque, Jiang Li, Jian Wu
- Abstract summary: We propose a system called SciEv that searches for scientific evidence papers given a scientific news article.
The key feature of SciEv is it uses domain knowledge entities (DKEs) to find candidates in the first stage.
To evaluate our system, we compiled a pilot dataset consisting of 100 manually curated (news, paper) pairs from ScienceAlert and similar websites.
- Score: 5.6164173936437045
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the past decade, many scientific news media that report scientific
breakthroughs and discoveries emerged, bringing science and technology closer
to the general public. However, not all scientific news article cites proper
sources, such as original scientific papers. A portion of scientific news
articles contain misinterpreted, exaggerated, or distorted information that
deviates from facts asserted in the original papers. Manually identifying
proper citations is laborious and costly. Therefore, it is necessary to
automatically search for pertinent scientific papers that could be used as
evidence for a given piece of scientific news. We propose a system called SciEv
that searches for scientific evidence papers given a scientific news article.
The system employs a 2-stage query paradigm with the first stage retrieving
candidate papers and the second stage reranking them. The key feature of SciEv
is it uses domain knowledge entities (DKEs) to find candidates in the first
stage, which proved to be more effective than regular keyphrases. In the
reranking stage, we explore different document representations for news
articles and candidate papers. To evaluate our system, we compiled a pilot
dataset consisting of 100 manually curated (news,paper) pairs from ScienceAlert
and similar websites. To our best knowledge, this is the first dataset of this
kind. Our experiments indicate that the transformer model performs the best for
DKE extraction. The system achieves a P@1=50%, P@5=71%, and P@10=74% when it
uses a TFIDF-based text representation. The transformer-based re-ranker
achieves a comparable performance but costs twice as much time. We will collect
more data and test the system for user experience.
Related papers
- SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions [52.35520385083425]
We present SciDMT, an enhanced and expanded corpus for scientific mention detection.
The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes.
arXiv Detail & Related papers (2024-06-20T22:03:21Z) - Can Large Language Models Detect Misinformation in Scientific News
Reporting? [1.0344642971058586]
This paper investigates whether it is possible to use large language models (LLMs) to detect misinformation in scientific reporting.
We first present a new labeled dataset SciNews, containing 2.4k scientific news stories drawn from trusted and untrustworthy sources.
We identify dimensions of scientific validity in science news articles and explore how this can be integrated into the automated detection of scientific misinformation.
arXiv Detail & Related papers (2024-02-22T04:07:00Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - SciFact-Open: Towards open-domain scientific claim verification [61.288725621156864]
We present SciFact-Open, a new test collection designed to evaluate the performance of scientific claim verification systems.
We collect evidence for scientific claims by pooling and annotating the top predictions of four state-of-the-art scientific claim verification models.
We find that systems developed on smaller corpora struggle to generalize to SciFact-Open, exhibiting performance drops of at least 15 F1.
arXiv Detail & Related papers (2022-10-25T05:45:00Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - SciClops: Detecting and Contextualizing Scientific Claims for Assisting
Manual Fact-Checking [7.507186058512835]
This paper describes SciClops, a method to help combat online scientific misinformation.
SciClops involves three main steps to process scientific claims found in online news articles and social media postings.
It effectively assists non-expert fact-checkers in the verification of complex scientific claims, outperforming commercial fact-checking systems.
arXiv Detail & Related papers (2021-10-25T16:35:58Z) - Semi-Supervised Exaggeration Detection of Health Science Press Releases [23.930041685595775]
Recent studies have demonstrated a tendency of news media to misrepresent scientific papers by exaggerating their findings.
We present a formalization of and study into the problem of exaggeration detection in science communication.
We introduce MT-PET, a multi-task version of Pattern Exploiting Training (PET), which leverages knowledge from complementary cloze-style QA tasks to improve few-shot learning.
arXiv Detail & Related papers (2021-08-30T19:32:20Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Fact or Fiction: Verifying Scientific Claims [53.29101835904273]
We introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that SUPPORTS or REFUTES a given scientific claim.
We construct SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts annotated with labels and rationales.
We show that our system is able to verify claims related to COVID-19 by identifying evidence from the CORD-19 corpus.
arXiv Detail & Related papers (2020-04-30T17:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.