RiddleSense: Answering Riddle Questions as Commonsense Reasoning
- URL: http://arxiv.org/abs/2101.00376v1
- Date: Sat, 2 Jan 2021 05:28:15 GMT
- Title: RiddleSense: Answering Riddle Questions as Commonsense Reasoning
- Authors: Bill Yuchen Lin, Ziyi Wu, Yichi Yang, Dong-Ho Lee, Xiang Ren
- Abstract summary: RiddleSense is a novel multiple-choice question answering challenge for benchmarking higher-order commonsense reasoning models.
RiddleSense is the first large dataset for riddle-style commonsense question answering, where the distractors are crowdsourced from human annotators.
We systematically evaluate a wide range of reasoning models over it and point out that there is a large gap between the best-supervised model and human performance.
- Score: 35.574564653690594
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A riddle is a mystifying, puzzling question about everyday concepts. For
example, the riddle "I have five fingers but I am not alive. What am I?" asks
about the concept of a glove. Solving riddles is a challenging cognitive
process for humans, in that it requires complex commonsense reasoning abilities
and an understanding of figurative language. However, there are currently no
commonsense reasoning datasets that test these abilities. We propose
RiddleSense, a novel multiple-choice question answering challenge for
benchmarking higher-order commonsense reasoning models, which is the first
large dataset for riddle-style commonsense question answering, where the
distractors are crowdsourced from human annotators. We systematically evaluate
a wide range of reasoning models over it and point out that there is a large
gap between the best-supervised model and human performance -- pointing to
interesting future research for higher-order commonsense reasoning and
computational creativity.
Related papers
- Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models [25.732397636695882]
We introduce $textitTruthQuest$, a benchmark for suppositional reasoning based on the principles of knights and knaves puzzles.
Evaluations show that large language models like Llama 3 and Mixtral-8x7B exhibit significant difficulties solving these tasks.
arXiv Detail & Related papers (2024-06-18T12:24:22Z) - Missed Connections: Lateral Thinking Puzzles for Large Language Models [2.1374208474242815]
The Connections puzzle published each day by the New York Times tasks players with dividing a bank of sixteen words into four groups of four words that each relate to a common theme.
We investigate the capacity for automated AI systems to play Connections and explore the game's potential as an automated benchmark for abstract reasoning.
arXiv Detail & Related papers (2024-04-17T20:31:05Z) - Implicit Chain of Thought Reasoning via Knowledge Distillation [58.80851216530288]
Instead of explicitly producing the chain of thought reasoning steps, we use the language model's internal hidden states to perform implicit reasoning.
We find that this approach enables solving tasks previously not solvable without explicit chain-of-thought, at a speed comparable to no chain-of-thought.
arXiv Detail & Related papers (2023-11-02T17:59:49Z) - Open-ended Commonsense Reasoning with Unrestricted Answer Scope [47.14397700770702]
Open-ended Commonsense Reasoning is defined as solving a commonsense question without providing 1) a short list of answer candidates and 2) a pre-defined answer scope.
In this work, we leverage pre-trained language models to iteratively retrieve reasoning paths on the external knowledge base.
The reasoning paths can help to identify the most precise answer to the commonsense question.
arXiv Detail & Related papers (2023-10-18T02:45:54Z) - CC-Riddle: A Question Answering Dataset of Chinese Character Riddles [51.41044750575767]
The Chinese character riddle is a unique form of cultural entertainment specific to the Chinese language.
We construct a textbfChinese textbfCharacter riddle dataset named CC-Riddle.
arXiv Detail & Related papers (2022-06-28T06:23:13Z) - BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles [82.63394952538292]
We introduce BiRdQA, a bilingual multiple-choice question answering dataset with 6614 English riddles and 8751 Chinese riddles.
Existing monolingual and multilingual QA models fail to perform well on our dataset, indicating that there is a long way to go before machine can beat human on solving tricky riddles.
arXiv Detail & Related papers (2021-09-23T00:46:47Z) - Differentiable Open-Ended Commonsense Reasoning [80.94997942571838]
We study open-ended commonsense reasoning (OpenCSR) using as a resource only a corpus of commonsense facts written in natural language.
As an approach to OpenCSR, we propose DrFact, an efficient Differentiable model for multi-hop Reasoning over knowledge Facts.
arXiv Detail & Related papers (2020-10-24T10:07:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.