TICO-19: the Translation Initiative for Covid-19
- URL: http://arxiv.org/abs/2007.01788v2
- Date: Mon, 6 Jul 2020 14:13:51 GMT
- Title: TICO-19: the Translation Initiative for Covid-19
- Authors: Antonios Anastasopoulos, Alessandro Cattelan, Zi-Yi Dou, Marcello
Federico, Christian Federman, Dmitriy Genzel, Francisco Guzm\'an, Junjie Hu,
Macduff Hughes, Philipp Koehn, Rosie Lazar, Will Lewis, Graham Neubig,
Mengmeng Niu, Alp \"Oktem, Eric Paquin, Grace Tang, and Sylwia Tur
- Abstract summary: The Translation Initiative for COvid-19 (TICO-19) has made test and development data available to AI and MT researchers in 35 different languages.
The same data is translated into all of the languages represented, meaning that testing or development can be done for any pairing of languages in the set.
- Score: 112.5601530395345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The COVID-19 pandemic is the worst pandemic to strike the world in over a
century. Crucial to stemming the tide of the SARS-CoV-2 virus is communicating
to vulnerable populations the means by which they can protect themselves. To
this end, the collaborators forming the Translation Initiative for COvid-19
(TICO-19) have made test and development data available to AI and MT
researchers in 35 different languages in order to foster the development of
tools and resources for improving access to information about COVID-19 in these
languages. In addition to 9 high-resourced, "pivot" languages, the team is
targeting 26 lesser resourced languages, in particular languages of Africa,
South Asia and South-East Asia, whose populations may be the most vulnerable to
the spread of the virus. The same data is translated into all of the languages
represented, meaning that testing or development can be done for any pairing of
languages in the set. Further, the team is converting the test and development
data into translation memories (TMXs) that can be used by localizers from and
to any of the languages.
Related papers
- SPEED++: A Multilingual Event Extraction Framework for Epidemic Prediction and Preparedness [73.73883111570458]
We introduce the first multilingual Event Extraction framework for extracting epidemic event information for a wide range of diseases and languages.
Annotating data in every language is infeasible; thus we develop zero-shot cross-lingual cross-disease models.
Our framework can provide epidemic warnings for COVID-19 in its earliest stages in Dec 2019 from Chinese Weibo posts without any training in Chinese.
arXiv Detail & Related papers (2024-10-24T03:03:54Z) - DN at SemEval-2023 Task 12: Low-Resource Language Text Classification
via Multilingual Pretrained Language Model Fine-tuning [0.0]
Most existing models and datasets for sentiment analysis are developed for high-resource languages, such as English and Chinese.
The AfriSenti-SemEval 2023 Shared Task 12 aims to fill this gap by evaluating sentiment analysis models on low-resource African languages.
We present our solution to the shared task, where we employed different multilingual XLM-R models with classification head trained on various data.
arXiv Detail & Related papers (2023-05-04T07:28:45Z) - MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity
Recognition [55.95128479289923]
African languages are spoken by over a billion people, but are underrepresented in NLP research and development.
We create the largest human-annotated NER dataset for 20 African languages.
We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points.
arXiv Detail & Related papers (2022-10-22T08:53:14Z) - Making a MIRACL: Multilingual Information Retrieval Across a Continuum
of Languages [62.730361829175415]
MIRACL is a multilingual dataset we have built for the WSDM 2023 Cup challenge.
It focuses on ad hoc retrieval across 18 different languages.
Our goal is to spur research that will improve retrieval across a continuum of languages.
arXiv Detail & Related papers (2022-10-18T16:47:18Z) - COVID-19 Named Entity Recognition for Vietnamese [6.17059264011429]
We present the first manually-annotated COVID-19 domain-specific dataset for Vietnamese.
Our dataset is annotated for the named entity recognition task with newly-defined entity types.
Our dataset also contains the largest number of entities compared to existing Vietnamese NER datasets.
arXiv Detail & Related papers (2021-04-08T16:35:34Z) - A System for Worldwide COVID-19 Information Aggregation [92.60866520230803]
We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics.
A neural machine translation module translates articles in other languages into Japanese and English.
A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently.
arXiv Detail & Related papers (2020-07-28T01:33:54Z) - Cross-lingual Transfer Learning for COVID-19 Outbreak Alignment [90.12602012910465]
We train on Italy's early COVID-19 outbreak through Twitter and transfer to several other countries.
Our experiments show strong results with up to 0.85 Spearman correlation in cross-country predictions.
arXiv Detail & Related papers (2020-06-05T02:04:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.