SPEED++: A Multilingual Event Extraction Framework for Epidemic Prediction and Preparedness
- URL: http://arxiv.org/abs/2410.18393v1
- Date: Thu, 24 Oct 2024 03:03:54 GMT
- Title: SPEED++: A Multilingual Event Extraction Framework for Epidemic Prediction and Preparedness
- Authors: Tanmay Parekh, Jeffrey Kwan, Jiarui Yu, Sparsh Johri, Hyosang Ahn, Sreya Muppalla, Kai-Wei Chang, Wei Wang, Nanyun Peng,
- Abstract summary: We introduce the first multilingual Event Extraction framework for extracting epidemic event information for a wide range of diseases and languages.
Annotating data in every language is infeasible; thus we develop zero-shot cross-lingual cross-disease models.
Our framework can provide epidemic warnings for COVID-19 in its earliest stages in Dec 2019 from Chinese Weibo posts without any training in Chinese.
- Score: 73.73883111570458
- License:
- Abstract: Social media is often the first place where communities discuss the latest societal trends. Prior works have utilized this platform to extract epidemic-related information (e.g. infections, preventive measures) to provide early warnings for epidemic prediction. However, these works only focused on English posts, while epidemics can occur anywhere in the world, and early discussions are often in the local, non-English languages. In this work, we introduce the first multilingual Event Extraction (EE) framework SPEED++ for extracting epidemic event information for a wide range of diseases and languages. To this end, we extend a previous epidemic ontology with 20 argument roles; and curate our multilingual EE dataset SPEED++ comprising 5.1K tweets in four languages for four diseases. Annotating data in every language is infeasible; thus we develop zero-shot cross-lingual cross-disease models (i.e., training only on English COVID data) utilizing multilingual pre-training and show their efficacy in extracting epidemic-related events for 65 diverse languages across different diseases. Experiments demonstrate that our framework can provide epidemic warnings for COVID-19 in its earliest stages in Dec 2019 (3 weeks before global discussions) from Chinese Weibo posts without any training in Chinese. Furthermore, we exploit our framework's argument extraction capabilities to aggregate community epidemic discussions like symptoms and cure measures, aiding misinformation detection and public attention monitoring. Overall, we lay a strong foundation for multilingual epidemic preparedness.
Related papers
- Multilingual Fine-Grained News Headline Hallucination Detection [40.62136051552646]
We introduce the first multilingual, fine-grained news headline hallucination detection dataset.
This dataset contains over 11 thousand pairs in 5 languages, each annotated with detailed hallucination types by experts.
We propose two novel techniques, language-dependent demonstration selection and coarse-to-fine prompting, to boost the few-shot hallucination detection performance.
arXiv Detail & Related papers (2024-07-22T18:37:53Z) - Event Detection from Social Media for Epidemic Prediction [76.90779562626541]
We develop a framework to extract and analyze epidemic-related events from social media posts.
Experimentation reveals how ED models trained on COVID-based SPEED can effectively detect epidemic events for three unseen epidemics.
We show that reporting sharp increases in the extracted events by our framework can provide warnings 4-9 weeks earlier than the WHO epidemic declaration for Monkeypox.
arXiv Detail & Related papers (2024-04-02T06:31:17Z) - Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by
Diminishing Bias [38.26934474189853]
Unifying Cross-Lingual Medical Vision-Language Pre-Training (Med-UniC) designed to integrate multimodal medical data from English and Spanish.
Med-UniC reaches superior performance across 5 medical image tasks and 10 datasets encompassing over 30 diseases.
arXiv Detail & Related papers (2023-05-31T14:28:19Z) - COLD: A Benchmark for Chinese Offensive Language Detection [54.60909500459201]
We use COLDataset, a Chinese offensive language dataset with 37k annotated sentences.
We also propose textscCOLDetector to study output offensiveness of popular Chinese language models.
Our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.
arXiv Detail & Related papers (2022-01-16T11:47:23Z) - Sentiment and Emotion Classification of Epidemic Related Bilingual data
from Social Media [1.7109522466982476]
The study exploits the bilingual (Urdu and English) data from Twitter and NEWS websites related to the dengue epidemic in Pakistan.
The proposed study exploits the bilingual (Urdu and English) data from Twitter and NEWS websites related to the dengue epidemic in Pakistan.
arXiv Detail & Related papers (2021-05-04T12:51:18Z) - TICO-19: the Translation Initiative for Covid-19 [112.5601530395345]
The Translation Initiative for COvid-19 (TICO-19) has made test and development data available to AI and MT researchers in 35 different languages.
The same data is translated into all of the languages represented, meaning that testing or development can be done for any pairing of languages in the set.
arXiv Detail & Related papers (2020-07-03T16:26:17Z) - Cross-lingual Transfer Learning for COVID-19 Outbreak Alignment [90.12602012910465]
We train on Italy's early COVID-19 outbreak through Twitter and transfer to several other countries.
Our experiments show strong results with up to 0.85 Spearman correlation in cross-country predictions.
arXiv Detail & Related papers (2020-06-05T02:04:25Z) - Tracking, exploring and analyzing recent developments in German-language
online press in the face of the coronavirus crisis: cOWIDplus Analysis and
cOWIDplus Viewer [62.997667081978825]
The coronavirus pandemic may be the largest crisis the world has had to face since World War II.
It does not come as a surprise that it is also having an impact on language as our primary communication tool.
We present three inter-connected resources that are designed to capture and illustrate these effects on a subset of the German language.
arXiv Detail & Related papers (2020-05-27T12:21:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.