Modeling Information Change in Science Communication with Semantically
Matched Paraphrases
- URL: http://arxiv.org/abs/2210.13001v1
- Date: Mon, 24 Oct 2022 07:44:38 GMT
- Title: Modeling Information Change in Science Communication with Semantically
Matched Paraphrases
- Authors: Dustin Wright and Jiaxin Pei and David Jurgens and Isabelle Augenstein
- Abstract summary: SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
- Score: 50.67030449927206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whether the media faithfully communicate scientific information has long been
a core issue to the science community. Automatically identifying paraphrased
scientific findings could enable large-scale tracking and analysis of
information changes in the science communication process, but this requires
systems to understand the similarity between scientific information across
multiple domains. To this end, we present the SCIENTIFIC PARAPHRASE AND
INFORMATION CHANGE DATASET (SPICED), the first paraphrase dataset of scientific
findings annotated for degree of information change. SPICED contains 6,000
scientific finding pairs extracted from news stories, social media discussions,
and full texts of original papers. We demonstrate that SPICED poses a
challenging task and that models trained on SPICED improve downstream
performance on evidence retrieval for fact checking of real-world scientific
claims. Finally, we show that models trained on SPICED can reveal large-scale
trends in the degrees to which people and organizations faithfully communicate
new scientific findings. Data, code, and pre-trained models are available at
http://www.copenlu.com/publication/2022_emnlp_wright/.
Related papers
- SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions [52.35520385083425]
We present SciDMT, an enhanced and expanded corpus for scientific mention detection.
The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes.
arXiv Detail & Related papers (2024-06-20T22:03:21Z) - A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
Large language models (LLMs) have revolutionized the way text and other modalities of data are handled.
We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
arXiv Detail & Related papers (2024-06-16T08:03:24Z) - LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery [141.39722070734737]
We propose to enhance the knowledge-driven, abstract reasoning abilities of Large Language Models with the computational strength of simulations.
We introduce Scientific Generative Agent (SGA), a bilevel optimization framework.
We conduct experiments to demonstrate our framework's efficacy in law discovery and molecular design.
arXiv Detail & Related papers (2024-05-16T03:04:10Z) - Can Large Language Models Detect Misinformation in Scientific News
Reporting? [1.0344642971058586]
This paper investigates whether it is possible to use large language models (LLMs) to detect misinformation in scientific reporting.
We first present a new labeled dataset SciNews, containing 2.4k scientific news stories drawn from trusted and untrustworthy sources.
We identify dimensions of scientific validity in science news articles and explore how this can be integrated into the automated detection of scientific misinformation.
arXiv Detail & Related papers (2024-02-22T04:07:00Z) - Understanding Fine-grained Distortions in Reports of Scientific Findings [46.96512578511154]
Distorted science communication harms individuals and society as it can lead to unhealthy behavior change and decrease trust in scientific institutions.
Given the rapidly increasing volume of science communication in recent years, a fine-grained understanding of how findings from scientific publications are reported to the general public is crucial.
arXiv Detail & Related papers (2024-02-19T19:00:01Z) - SciTweets -- A Dataset and Annotation Framework for Detecting Scientific
Online Discourse [2.3371548697609303]
Scientific topics, claims and resources are increasingly debated as part of online discourse.
This has led to both significant societal impact and increased interest in scientific online discourse from various disciplines.
Research across disciplines currently suffers from a lack of robust definitions of the various forms of science-relatedness.
arXiv Detail & Related papers (2022-06-15T08:14:55Z) - A Computational Inflection for Scientific Discovery [48.176406062568674]
We stand at the foot of a significant inflection in the trajectory of scientific discovery.
As society continues on its fast-paced digital transformation, so does humankind's collective scientific knowledge.
Computer science is poised to ignite a revolution in the scientific process itself.
arXiv Detail & Related papers (2022-05-04T11:36:54Z) - Semi-Supervised Exaggeration Detection of Health Science Press Releases [23.930041685595775]
Recent studies have demonstrated a tendency of news media to misrepresent scientific papers by exaggerating their findings.
We present a formalization of and study into the problem of exaggeration detection in science communication.
We introduce MT-PET, a multi-task version of Pattern Exploiting Training (PET), which leverages knowledge from complementary cloze-style QA tasks to improve few-shot learning.
arXiv Detail & Related papers (2021-08-30T19:32:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.