BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles
- URL: http://arxiv.org/abs/2109.11087v1
- Date: Thu, 23 Sep 2021 00:46:47 GMT
- Title: BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles
- Authors: Yunxiang Zhang, Xiaojun Wan
- Abstract summary: We introduce BiRdQA, a bilingual multiple-choice question answering dataset with 6614 English riddles and 8751 Chinese riddles.
Existing monolingual and multilingual QA models fail to perform well on our dataset, indicating that there is a long way to go before machine can beat human on solving tricky riddles.
- Score: 82.63394952538292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A riddle is a question or statement with double or veiled meanings, followed
by an unexpected answer. Solving riddle is a challenging task for both machine
and human, testing the capability of understanding figurative, creative natural
language and reasoning with commonsense knowledge. We introduce BiRdQA, a
bilingual multiple-choice question answering dataset with 6614 English riddles
and 8751 Chinese riddles. For each riddle-answer pair, we provide four
distractors with additional information from Wikipedia. The distractors are
automatically generated at scale with minimal bias. Existing monolingual and
multilingual QA models fail to perform well on our dataset, indicating that
there is a long way to go before machine can beat human on solving tricky
riddles. The dataset has been released to the community.
Related papers
- Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - WikiWhy: Answering and Explaining Cause-and-Effect Questions [62.60993594814305]
We introduce WikiWhy, a QA dataset built around explaining why an answer is true in natural language.
WikiWhy contains over 9,000 "why" question-answer-rationale triples, grounded on Wikipedia facts across a diverse set of topics.
GPT-3 baselines achieve only 38.7% human-evaluated correctness in the end-to-end answer & explain condition.
arXiv Detail & Related papers (2022-10-21T17:59:03Z) - CC-Riddle: A Question Answering Dataset of Chinese Character Riddles [51.41044750575767]
The Chinese character riddle is a unique form of cultural entertainment specific to the Chinese language.
We construct a textbfChinese textbfCharacter riddle dataset named CC-Riddle.
arXiv Detail & Related papers (2022-06-28T06:23:13Z) - A Puzzle-Based Dataset for Natural Language Inference [0.9594432031144714]
The dataset contains logical puzzles in natural language from three domains: comparing puzzles, knighs and knaves, and zebra puzzles.
Each puzzle is associated with the entire set of atomic questions that can be generated based on the relations and individuals occurring in the text.
arXiv Detail & Related papers (2021-12-10T18:53:06Z) - RiddleSense: Answering Riddle Questions as Commonsense Reasoning [35.574564653690594]
RiddleSense is a novel multiple-choice question answering challenge for benchmarking higher-order commonsense reasoning models.
RiddleSense is the first large dataset for riddle-style commonsense question answering, where the distractors are crowdsourced from human annotators.
We systematically evaluate a wide range of reasoning models over it and point out that there is a large gap between the best-supervised model and human performance.
arXiv Detail & Related papers (2021-01-02T05:28:15Z) - XOR QA: Cross-lingual Open-Retrieval Question Answering [75.20578121267411]
This work extends open-retrieval question answering to a cross-lingual setting.
We construct a large-scale dataset built on questions lacking same-language answers.
arXiv Detail & Related papers (2020-10-22T16:47:17Z) - MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain
Question Answering [6.452012363895865]
This dataset supplies the widest range of languages to-date for evaluating question answering.
We benchmark a variety of state-of-the-art methods and baselines for generative and extractive question answering.
Results indicate this dataset is challenging even in English, but especially in low-resource languages.
arXiv Detail & Related papers (2020-07-30T03:33:46Z) - Knowledgeable Dialogue Reading Comprehension on Key Turns [84.1784903043884]
Multi-choice machine reading comprehension (MRC) requires models to choose the correct answer from candidate options given a passage and a question.
Our research focuses dialogue-based MRC, where the passages are multi-turn dialogues.
It suffers from two challenges, the answer selection decision is made without support of latently helpful commonsense, and the multi-turn context may hide considerable irrelevant information.
arXiv Detail & Related papers (2020-04-29T07:04:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.