TyDi QA: A Benchmark for Information-Seeking Question Answering in
Typologically Diverse Languages
- URL: http://arxiv.org/abs/2003.05002v1
- Date: Tue, 10 Mar 2020 21:11:53 GMT
- Title: TyDi QA: A Benchmark for Information-Seeking Question Answering in
Typologically Diverse Languages
- Authors: Jonathan H. Clark, Eunsol Choi, Michael Collins, Dan Garrette, Tom
Kwiatkowski, Vitaly Nikolaev, and Jennimaria Palomaki
- Abstract summary: TyDi QA is a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs.
We present a quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena.
- Score: 27.588857710802113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Confidently making progress on multilingual modeling requires challenging,
trustworthy evaluations. We present TyDi QA---a question answering dataset
covering 11 typologically diverse languages with 204K question-answer pairs.
The languages of TyDi QA are diverse with regard to their typology---the set of
linguistic features each language expresses---such that we expect models
performing well on this set to generalize across a large number of the world's
languages. We present a quantitative analysis of the data quality and
example-level qualitative linguistic analyses of observed language phenomena
that would not be found in English-only corpora. To provide a realistic
information-seeking task and avoid priming effects, questions are written by
people who want to know the answer, but don't know the answer yet, and the data
is collected directly in each language without the use of translation.
Related papers
- CaLMQA: Exploring culturally specific long-form question answering across 23 languages [58.18984409715615]
CaLMQA is a collection of 1.5K culturally specific questions spanning 23 languages and 51 culturally translated questions from English into 22 other languages.
We collect naturally-occurring questions from community web forums and hire native speakers to write questions to cover under-studied languages such as Fijian and Kirundi.
Our dataset contains diverse, complex questions that reflect cultural topics (e.g. traditions, laws, news) and the language usage of native speakers.
arXiv Detail & Related papers (2024-06-25T17:45:26Z) - CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark [68.21939124278065]
Culturally-diverse multilingual Visual Question Answering benchmark designed to cover a rich set of languages and cultures.
CVQA includes culturally-driven images and questions from across 30 countries on four continents, covering 31 languages with 13 scripts, providing a total of 10k questions.
We benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models.
arXiv Detail & Related papers (2024-06-10T01:59:00Z) - Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants [80.4837840962273]
We present Belebele, a dataset spanning 122 language variants.
This dataset enables the evaluation of text models in high-, medium-, and low-resource languages.
arXiv Detail & Related papers (2023-08-31T17:43:08Z) - Language Embeddings Sometimes Contain Typological Generalizations [0.0]
We train neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1295 languages.
The learned language representations are then compared to existing typological databases as well as to a novel set of quantitative syntactic and morphological features.
We conclude that some generalizations are surprisingly close to traditional features from linguistic typology, but that most models, as well as those of previous work, do not appear to have made linguistically meaningful generalizations.
arXiv Detail & Related papers (2023-01-19T15:09:59Z) - TyDiP: A Dataset for Politeness Classification in Nine Typologically
Diverse Languages [33.540256516320326]
We study politeness phenomena in nine typologically diverse languages.
We create TyDiP, a dataset containing three-way politeness annotations for 500 examples in each language.
arXiv Detail & Related papers (2022-11-29T18:58:15Z) - Universal and Independent: Multilingual Probing Framework for Exhaustive
Model Interpretation and Evaluation [0.04199844472131922]
We present and apply the GUI-assisted framework allowing us to easily probe a massive number of languages.
Most of the regularities revealed in the mBERT model are typical for the western-European languages.
Our framework can be integrated with the existing probing toolboxes, model cards, and leaderboards.
arXiv Detail & Related papers (2022-10-24T13:41:17Z) - Investigating Information Inconsistency in Multilingual Open-Domain
Question Answering [18.23417521199809]
We analyze the behavior of multilingual open-domain question answering models with a focus on retrieval bias.
We speculate that the content differences in documents across languages might reflect cultural divergences and/or social biases.
arXiv Detail & Related papers (2022-05-25T02:58:54Z) - Delving Deeper into Cross-lingual Visual Question Answering [115.16614806717341]
We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance.
We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers.
arXiv Detail & Related papers (2022-02-15T18:22:18Z) - MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain
Question Answering [6.452012363895865]
This dataset supplies the widest range of languages to-date for evaluating question answering.
We benchmark a variety of state-of-the-art methods and baselines for generative and extractive question answering.
Results indicate this dataset is challenging even in English, but especially in low-resource languages.
arXiv Detail & Related papers (2020-07-30T03:33:46Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.