Semi-Supervised Exaggeration Detection of Health Science Press Releases
- URL: http://arxiv.org/abs/2108.13493v1
- Date: Mon, 30 Aug 2021 19:32:20 GMT
- Title: Semi-Supervised Exaggeration Detection of Health Science Press Releases
- Authors: Dustin Wright and Isabelle Augenstein
- Abstract summary: Recent studies have demonstrated a tendency of news media to misrepresent scientific papers by exaggerating their findings.
We present a formalization of and study into the problem of exaggeration detection in science communication.
We introduce MT-PET, a multi-task version of Pattern Exploiting Training (PET), which leverages knowledge from complementary cloze-style QA tasks to improve few-shot learning.
- Score: 23.930041685595775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Public trust in science depends on honest and factual communication of
scientific papers. However, recent studies have demonstrated a tendency of news
media to misrepresent scientific papers by exaggerating their findings. Given
this, we present a formalization of and study into the problem of exaggeration
detection in science communication. While there are an abundance of scientific
papers and popular media articles written about them, very rarely do the
articles include a direct link to the original paper, making data collection
challenging. We address this by curating a set of labeled press
release/abstract pairs from existing expert annotated studies on exaggeration
in press releases of scientific papers suitable for benchmarking the
performance of machine learning models on the task. Using limited data from
this and previous studies on exaggeration detection in science, we introduce
MT-PET, a multi-task version of Pattern Exploiting Training (PET), which
leverages knowledge from complementary cloze-style QA tasks to improve few-shot
learning. We demonstrate that MT-PET outperforms PET and supervised learning
both when data is limited, as well as when there is an abundance of data for
the main task.
Related papers
- Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones, and find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - PST-Bench: Tracing and Benchmarking the Source of Publications [39.250042251037144]
We study the problem of paper source tracing (PST) and construct a high-quality and ever-increasing dataset PST-Bench in computer science.
Based on PST-Bench, we reveal several intriguing discoveries, such as the differing evolution patterns across various topics.
arXiv Detail & Related papers (2024-02-25T06:56:43Z) - Can Large Language Models Detect Misinformation in Scientific News
Reporting? [1.0344642971058586]
This paper investigates whether it is possible to use large language models (LLMs) to detect misinformation in scientific reporting.
We first present a new labeled dataset SciNews, containing 2.4k scientific news stories drawn from trusted and untrustworthy sources.
We identify dimensions of scientific validity in science news articles and explore how this can be integrated into the automated detection of scientific misinformation.
arXiv Detail & Related papers (2024-02-22T04:07:00Z) - PaperQA: Retrieval-Augmented Generative Agent for Scientific Research [41.9628176602676]
We present PaperQA, a RAG agent for answering questions over the scientific literature.
PaperQA is an agent that performs information retrieval across full-text scientific articles, assesses the relevance of sources and passages, and uses RAG to provide answers.
We also introduce LitQA, a more complex benchmark that requires retrieval and synthesis of information from full-text scientific papers across the literature.
arXiv Detail & Related papers (2023-12-08T18:50:20Z) - Pre-training Multi-task Contrastive Learning Models for Scientific
Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks.
We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - Large-Scale Data Mining of Rapid Residue Detection Assay Data From HTML
and PDF Documents: Improving Data Access and Visualization for Veterinarians [3.055086390437118]
Extra-label drug use in food animal medicine is authorized by the US Animal Medicinal Drug Use Clarification Act (AMDUCA)
Occasionally there is a paucity of scientific data on which to base a withdrawal interval or a large number of animals being treated, driving the need to test for drug residues.
Rapid assay commercial farm-side tests are essential for monitoring drug residues in animal products to protect human health.
arXiv Detail & Related papers (2021-12-02T03:39:25Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.