MM-COVID: A Multilingual and Multimodal Data Repository for Combating
COVID-19 Disinformation
- URL: http://arxiv.org/abs/2011.04088v2
- Date: Mon, 23 Nov 2020 06:04:23 GMT
- Title: MM-COVID: A Multilingual and Multimodal Data Repository for Combating
COVID-19 Disinformation
- Authors: Yichuan Li, Bohan Jiang, Kai Shu, Huan Liu
- Abstract summary: This dataset provides the multilingual fake news and the relevant social context.
We collect 3981 pieces of fake news content and 7192 trustworthy information from English, Spanish, Portuguese, Hindi, French and Italian, 6 different languages.
- Score: 37.52398946169075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The COVID-19 epidemic is considered as the global health crisis of the whole
society and the greatest challenge mankind faced since World War Two.
Unfortunately, the fake news about COVID-19 is spreading as fast as the virus
itself. The incorrect health measurements, anxiety, and hate speeches will have
bad consequences on people's physical health, as well as their mental health in
the whole world. To help better combat the COVID-19 fake news, we propose a new
fake news detection dataset MM-COVID(Multilingual and Multidimensional COVID-19
Fake News Data Repository). This dataset provides the multilingual fake news
and the relevant social context. We collect 3981 pieces of fake news content
and 7192 trustworthy information from English, Spanish, Portuguese, Hindi,
French and Italian, 6 different languages. We present a detailed and
exploratory analysis of MM-COVID from different perspectives and demonstrate
the utility of MM-COVID in several potential applications of COVID-19 fake news
study on multilingual and social media.
Related papers
- "COVID-19 was a FIFA conspiracy #curropt": An Investigation into the
Viral Spread of COVID-19 Misinformation [60.268682953952506]
We estimate the extent to which misinformation has influenced the course of the COVID-19 pandemic using natural language processing models.
We provide a strategy to combat social media posts that are likely to cause widespread harm.
arXiv Detail & Related papers (2022-06-12T19:41:01Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - COVID-19 and Big Data: Multi-faceted Analysis for Spatio-temporal
Understanding of the Pandemic with Social Media Conversations [4.07452542897703]
Social media platforms have served as a vehicle for the global conversation about COVID-19.
We present a framework for analysis, mining, and tracking the critical content and characteristics of social media conversations around the pandemic.
arXiv Detail & Related papers (2021-04-22T00:45:50Z) - Transformer based Automatic COVID-19 Fake News Detection System [9.23545668304066]
Misinformation is especially prevalent in the ongoing coronavirus disease (COVID-19) pandemic.
We report a methodology to analyze the reliability of information shared on social media pertaining to the COVID-19 pandemic.
Our system obtained 0.9855 f1-score on testset and ranked 5th among 160 teams.
arXiv Detail & Related papers (2021-01-01T06:49:27Z) - A Multilingual Neural Machine Translation Model for Biomedical Data [84.17747489525794]
We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain.
The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English.
It is trained with large amounts of generic and biomedical data, using domain tags.
arXiv Detail & Related papers (2020-08-06T21:26:43Z) - A System for Worldwide COVID-19 Information Aggregation [92.60866520230803]
We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics.
A neural machine translation module translates articles in other languages into Japanese and English.
A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently.
arXiv Detail & Related papers (2020-07-28T01:33:54Z) - TICO-19: the Translation Initiative for Covid-19 [112.5601530395345]
The Translation Initiative for COvid-19 (TICO-19) has made test and development data available to AI and MT researchers in 35 different languages.
The same data is translated into all of the languages represented, meaning that testing or development can be done for any pairing of languages in the set.
arXiv Detail & Related papers (2020-07-03T16:26:17Z) - CoAID: COVID-19 Healthcare Misinformation Dataset [12.768221316730674]
CoAID includes 4,251 news, 296,000 related user engagements, 926 social platform posts about COVID-19, and ground truth labels.
CoAID includes 4,251 news, 296,000 related user engagements, 926 social platform posts about COVID-19, and ground truth labels.
arXiv Detail & Related papers (2020-05-22T19:08:14Z) - Fighting the COVID-19 Infodemic: Modeling the Perspective of
Journalists, Fact-Checkers, Social Media Platforms, Policy Makers, and the
Society [37.9389191670008]
COVID-19 has been declared one of the most important focus areas of the World Health Organization.
Fighting this infodemic has been declared one of the most important focus areas of the World Health Organization.
We release a large dataset of 16K manually annotated tweets for fine-grained disinformation analysis.
arXiv Detail & Related papers (2020-04-30T18:04:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.