CoVERT: A Corpus of Fact-checked Biomedical COVID-19 Tweets
- URL: http://arxiv.org/abs/2204.12164v1
- Date: Tue, 26 Apr 2022 09:05:03 GMT
- Title: CoVERT: A Corpus of Fact-checked Biomedical COVID-19 Tweets
- Authors: Isabelle Mohr and Amelie W\"uhrl and Roman Klinger
- Abstract summary: CoVERT is a fact-checked corpus of tweets with a focus on biomedicine and COVID-19-related (mis)information.
We employ a novel crowdsourcing methodology to annotate all tweets with fact-checking labels and supporting evidence, which crowdworkers search for online.
We use the retrieved evidence extracts as part of a fact-checking pipeline, finding that the real-world evidence is more useful than the knowledge indirectly available in pretrained language models.
- Score: 10.536415845097661
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Over the course of the COVID-19 pandemic, large volumes of biomedical
information concerning this new disease have been published on social media.
Some of this information can pose a real danger to people's health,
particularly when false information is shared, for instance recommendations on
how to treat diseases without professional medical advice. Therefore, automatic
fact-checking resources and systems developed specifically for the medical
domain are crucial. While existing fact-checking resources cover
COVID-19-related information in news or quantify the amount of misinformation
in tweets, there is no dataset providing fact-checked COVID-19-related Twitter
posts with detailed annotations for biomedical entities, relations and relevant
evidence. We contribute CoVERT, a fact-checked corpus of tweets with a focus on
the domain of biomedicine and COVID-19-related (mis)information. The corpus
consists of 300 tweets, each annotated with medical named entities and
relations. We employ a novel crowdsourcing methodology to annotate all tweets
with fact-checking labels and supporting evidence, which crowdworkers search
for online. This methodology results in moderate inter-annotator agreement.
Furthermore, we use the retrieved evidence extracts as part of a fact-checking
pipeline, finding that the real-world evidence is more useful than the
knowledge indirectly available in pretrained language models.
Related papers
- AMIR: Automated MisInformation Rebuttal -- A COVID-19 Vaccination Datasets based Recommendation System [0.05461938536945722]
This work explored how existing information obtained from social media can be harnessed to facilitate automated rebuttal of misinformation at scale.
It leverages two publicly available datasets, FaCov (fact-checked articles) and misleading (social media Twitter) data on COVID-19 Vaccination.
arXiv Detail & Related papers (2023-10-29T13:07:33Z) - METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19
Related Tweets [13.35986397208115]
This paper releases METS-CoV, a dataset containing medical entities and targeted sentiments from COVID-19-related tweets.
To the best of our knowledge, METS-CoV is the first dataset to collect medical entities and corresponding sentiments of COVID-19-related tweets.
arXiv Detail & Related papers (2022-09-28T01:55:14Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - "COVID-19 was a FIFA conspiracy #curropt": An Investigation into the
Viral Spread of COVID-19 Misinformation [60.268682953952506]
We estimate the extent to which misinformation has influenced the course of the COVID-19 pandemic using natural language processing models.
We provide a strategy to combat social media posts that are likely to cause widespread harm.
arXiv Detail & Related papers (2022-06-12T19:41:01Z) - Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked
Claims [0.6927055673104934]
We publish a feature-rich dataset of 317k medical news articles/blogs and 3.5k fact-checked claims.
It also contains 573 manually and more than 51k automatically labelled mappings between claims and articles.
The dataset enables a number of additional tasks related to medical misinformation, such as misinformation characterisation studies or studies of misinformation diffusion between sources.
arXiv Detail & Related papers (2022-04-26T13:18:27Z) - When Accuracy Meets Privacy: Two-Stage Federated Transfer Learning
Framework in Classification of Medical Images on Limited Data: A COVID-19
Case Study [77.34726150561087]
COVID-19 pandemic has spread rapidly and caused a shortage of global medical resources.
CNN has been widely utilized and verified in analyzing medical images.
arXiv Detail & Related papers (2022-03-24T02:09:41Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - Domain-Specific Pretraining for Vertical Search: Case Study on
Biomedical Literature [67.4680600632232]
Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck.
We propose a general approach for vertical search based on domain-specific pretraining.
Our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search.
arXiv Detail & Related papers (2021-06-25T01:02:55Z) - Claim Detection in Biomedical Twitter Posts [11.335643770130238]
False information on biomedical topics can be particularly dangerous.
We aim to fill this research gap and annotate a corpus of 1200 tweets for implicit and explicit biomedical claims.
We develop baseline models which detect tweets that contain a claim automatically.
arXiv Detail & Related papers (2021-04-23T14:45:31Z) - Misinformation Has High Perplexity [55.47422012881148]
We propose to leverage the perplexity to debunk false claims in an unsupervised manner.
First, we extract reliable evidence from scientific and news sources according to sentence similarity to the claims.
Second, we prime a language model with the extracted evidence and finally evaluate the correctness of given claims based on the perplexity scores at debunking time.
arXiv Detail & Related papers (2020-06-08T15:13:44Z) - An Exploratory Study of COVID-19 Misinformation on Twitter [5.070542698701158]
During the COVID-19 pandemic, social media has become a home ground for misinformation.
We have conducted an exploratory study into the propagation, authors and content of misinformation on Twitter around the topic of COVID-19.
arXiv Detail & Related papers (2020-05-12T12:07:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.