A System for Worldwide COVID-19 Information Aggregation
- URL: http://arxiv.org/abs/2008.01523v2
- Date: Sun, 11 Oct 2020 05:36:36 GMT
- Title: A System for Worldwide COVID-19 Information Aggregation
- Authors: Akiko Aizawa, Frederic Bergeron, Junjie Chen, Fei Cheng, Katsuhiko
Hayashi, Kentaro Inui, Hiroyoshi Ito, Daisuke Kawahara, Masaru Kitsuregawa,
Hirokazu Kiyomaru, Masaki Kobayashi, Takashi Kodama, Sadao Kurohashi,
Qianying Liu, Masaki Matsubara, Yusuke Miyao, Atsuyuki Morishima, Yugo
Murawaki, Kazumasa Omura, Haiyue Song, Eiichiro Sumita, Shinji Suzuki, Ribeka
Tanaka, Yu Tanaka, Masashi Toyoda, Nobuhiro Ueda, Honai Ueoka, Masao Utiyama,
Ying Zhong
- Abstract summary: We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics.
A neural machine translation module translates articles in other languages into Japanese and English.
A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently.
- Score: 92.60866520230803
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The global pandemic of COVID-19 has made the public pay close attention to
related news, covering various domains, such as sanitation, treatment, and
effects on education. Meanwhile, the COVID-19 condition is very different among
the countries (e.g., policies and development of the epidemic), and thus
citizens would be interested in news in foreign countries. We build a system
for worldwide COVID-19 information aggregation containing reliable articles
from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related
website dataset collected through crowdsourcing ensures the quality of the
articles. A neural machine translation module translates articles in other
languages into Japanese and English. A BERT-based topic-classifier trained on
our article-topic pair dataset helps users find their interested information
efficiently by putting articles into different categories.
Related papers
- Making a MIRACL: Multilingual Information Retrieval Across a Continuum
of Languages [62.730361829175415]
MIRACL is a multilingual dataset we have built for the WSDM 2023 Cup challenge.
It focuses on ad hoc retrieval across 18 different languages.
Our goal is to spur research that will improve retrieval across a continuum of languages.
arXiv Detail & Related papers (2022-10-18T16:47:18Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - Facebook AI WMT21 News Translation Task Submission [23.69817809546458]
We describe Facebook's multilingual model submission to the WMT2021 shared task on news translation.
We participate in 14 language directions: English to and from Czech, German, Hausa, Icelandic, Japanese, Russian, and Chinese.
We utilize data from all available sources to create high quality bilingual and multilingual baselines.
arXiv Detail & Related papers (2021-08-06T18:26:38Z) - Topic Modeling and Progression of American Digital News Media During the
Onset of the COVID-19 Pandemic [2.798697306330988]
Currently, the world is in the midst of a severe global pandemic, which has affected all aspects of people's lives.
There is a deluge of COVID-related digital media articles published in the United States, due to the disparate effects of the pandemic.
We develop a Natural Language Processing pipeline that is capable of automatically distilling various digital articles into manageable pieces of information.
arXiv Detail & Related papers (2021-05-25T14:27:47Z) - \textit{NewsEdits}: A Dataset of Revision Histories for News Articles
(Technical Report: Data Processing) [89.77347919191774]
textitNewsEdits is the first publicly available dataset of news article revision histories.
It contains 1,278,804 articles with 4,609,430 versions from over 22 English- and French-language newspaper sources.
arXiv Detail & Related papers (2021-04-19T21:15:30Z) - MM-COVID: A Multilingual and Multimodal Data Repository for Combating
COVID-19 Disinformation [37.52398946169075]
This dataset provides the multilingual fake news and the relevant social context.
We collect 3981 pieces of fake news content and 7192 trustworthy information from English, Spanish, Portuguese, Hindi, French and Italian, 6 different languages.
arXiv Detail & Related papers (2020-11-08T21:42:03Z) - A Multilingual Neural Machine Translation Model for Biomedical Data [84.17747489525794]
We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain.
The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English.
It is trained with large amounts of generic and biomedical data, using domain tags.
arXiv Detail & Related papers (2020-08-06T21:26:43Z) - TICO-19: the Translation Initiative for Covid-19 [112.5601530395345]
The Translation Initiative for COvid-19 (TICO-19) has made test and development data available to AI and MT researchers in 35 different languages.
The same data is translated into all of the languages represented, meaning that testing or development can be done for any pairing of languages in the set.
arXiv Detail & Related papers (2020-07-03T16:26:17Z) - Classification Aware Neural Topic Model and its Application on a New
COVID-19 Disinformation Corpus [2.492887522265771]
The explosion of disinformation following the COVID-19 pandemic has overloaded fact-checkers and media worldwide.
To help tackle this, we developed computational methods to categorise COVID-19 disinformation.
arXiv Detail & Related papers (2020-06-05T10:32:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.