CaLMQA: Exploring culturally specific long-form question answering across 23 languages
- URL: http://arxiv.org/abs/2406.17761v2
- Date: Wed, 3 Jul 2024 16:33:55 GMT
- Title: CaLMQA: Exploring culturally specific long-form question answering across 23 languages
- Authors: Shane Arora, Marzena Karpinska, Hung-Ting Chen, Ipsita Bhattacharjee, Mohit Iyyer, Eunsol Choi,
- Abstract summary: CaLMQA is a collection of 1.5K culturally specific questions spanning 23 languages and 51 culturally translated questions from English into 22 other languages.
We collect naturally-occurring questions from community web forums and hire native speakers to write questions to cover under-studied languages such as Fijian and Kirundi.
Our dataset contains diverse, complex questions that reflect cultural topics (e.g. traditions, laws, news) and the language usage of native speakers.
- Score: 58.18984409715615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are used for long-form question answering (LFQA), which requires them to generate paragraph-length answers to complex questions. While LFQA has been well-studied in English, this research has not been extended to other languages. To bridge this gap, we introduce CaLMQA, a collection of 1.5K complex culturally specific questions spanning 23 languages and 51 culturally agnostic questions translated from English into 22 other languages. We define culturally specific questions as those uniquely or more likely to be asked by people from cultures associated with the question's language. We collect naturally-occurring questions from community web forums and hire native speakers to write questions to cover under-resourced, rarely-studied languages such as Fijian and Kirundi. Our dataset contains diverse, complex questions that reflect cultural topics (e.g. traditions, laws, news) and the language usage of native speakers. We automatically evaluate a suite of open- and closed-source models on CaLMQA by detecting incorrect language and token repetitions in answers, and observe that the quality of LLM-generated answers degrades significantly for some low-resource languages. Lastly, we perform human evaluation on a subset of models and languages. Manual evaluation reveals that model performance is significantly worse for culturally specific questions than for culturally agnostic questions. Our findings highlight the need for further research in non-English LFQA and provide an evaluation framework.
Related papers
- OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context [4.39796591456426]
We present OMoS-QA, a dataset of German and English questions paired with trustworthy documents and manually annotated answers.
Questions are automatically generated with an open-source large language model (LLM) and answer sentences are selected by crowd workers.
We find high precision and low-to-mid recall in selecting answer sentences, which is a favorable trade-off to avoid misleading users.
arXiv Detail & Related papers (2024-07-22T15:40:17Z) - CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark [68.21939124278065]
Culturally-diverse multilingual Visual Question Answering benchmark designed to cover a rich set of languages and cultures.
CVQA includes culturally-driven images and questions from across 30 countries on four continents, covering 31 languages with 13 scripts, providing a total of 10k questions.
We benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models.
arXiv Detail & Related papers (2024-06-10T01:59:00Z) - Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models [65.10456412127405]
MLaKE is a benchmark for the adaptability of knowledge editing methods across five languages.
MLaKE aggregates fact chains from Wikipedia across languages and generates questions in both free-form and multiple-choice.
We evaluate the multilingual knowledge editing generalization capabilities of existing methods on MLaKE.
arXiv Detail & Related papers (2024-04-07T15:23:28Z) - ELQA: A Corpus of Metalinguistic Questions and Answers about English [24.006858451437534]
Collected from two online forums, the >70k questions cover wide-ranging topics including grammar, meaning, fluency, and etymology.
Unlike most NLP datasets, this corpus is metalinguistic -- it consists of language about language.
arXiv Detail & Related papers (2022-05-01T04:29:50Z) - Cross-Lingual GenQA: A Language-Agnostic Generative Question Answering
Approach for Open-Domain Question Answering [76.99585451345702]
Open-Retrieval Generative Question Answering (GenQA) is proven to deliver high-quality, natural-sounding answers in English.
We present the first generalization of the GenQA approach for the multilingual environment.
arXiv Detail & Related papers (2021-10-14T04:36:29Z) - XOR QA: Cross-lingual Open-Retrieval Question Answering [75.20578121267411]
This work extends open-retrieval question answering to a cross-lingual setting.
We construct a large-scale dataset built on questions lacking same-language answers.
arXiv Detail & Related papers (2020-10-22T16:47:17Z) - TyDi QA: A Benchmark for Information-Seeking Question Answering in
Typologically Diverse Languages [27.588857710802113]
TyDi QA is a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs.
We present a quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena.
arXiv Detail & Related papers (2020-03-10T21:11:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.