A Puzzle-Based Dataset for Natural Language Inference
- URL: http://arxiv.org/abs/2112.05742v1
- Date: Fri, 10 Dec 2021 18:53:06 GMT
- Title: A Puzzle-Based Dataset for Natural Language Inference
- Authors: Roxana Szomiu and Adrian Groza
- Abstract summary: The dataset contains logical puzzles in natural language from three domains: comparing puzzles, knighs and knaves, and zebra puzzles.
Each puzzle is associated with the entire set of atomic questions that can be generated based on the relations and individuals occurring in the text.
- Score: 0.9594432031144714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We provide here a dataset for tasks related to natural language understanding
and natural language inference. The dataset contains logical puzzles in natural
language from three domains: comparing puzzles, knighs and knaves, and zebra
puzzles. Each puzzle is associated with the entire set of atomic questions that
can be generated based on the relations and individuals occurring in the text.
For each question we provide the correct answer: entailment, contradiction or
ambiguity. The answer's correctness is verified against theorem provers. Good
puzzles have two properties: (i) each piece of information is necessary and
(ii) no unnecessary information is provided. These properties make puzzles
interesting candidates for machine comprehension tasks.
Related papers
- Space3D-Bench: Spatial 3D Question Answering Benchmark [49.259397521459114]
We present Space3D-Bench - a collection of 1000 general spatial questions and answers related to scenes of the Replica dataset.
We provide an assessment system that grades natural language responses based on predefined ground-truth answers.
Finally, we introduce a baseline called RAG3D-Chat integrating the world understanding of foundation models with rich context retrieval.
arXiv Detail & Related papers (2024-08-29T16:05:22Z) - Missed Connections: Lateral Thinking Puzzles for Large Language Models [2.1374208474242815]
The Connections puzzle published each day by the New York Times tasks players with dividing a bank of sixteen words into four groups of four words that each relate to a common theme.
We investigate the capacity for automated AI systems to play Connections and explore the game's potential as an automated benchmark for abstract reasoning.
arXiv Detail & Related papers (2024-04-17T20:31:05Z) - Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious
Challenges in Multimodal Reasoning [24.386388107656334]
This paper introduces the novel task of multimodal puzzle solving, framed within the context of visual question-answering.
We present a new dataset, AlgoVQA, designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles.
arXiv Detail & Related papers (2024-03-06T17:15:04Z) - Complex Reading Comprehension Through Question Decomposition [48.256818683923626]
We propose a novel learning approach that helps language models better understand difficult multi-hop questions.
Our model first learns to decompose each multi-hop question into several sub-questions by a trainable question decomposer.
We leverage a reading comprehension model to predict the answer in a sequence-to-sequence manner.
arXiv Detail & Related papers (2022-11-07T02:54:04Z) - Down and Across: Introducing Crossword-Solving as a New NLP Benchmark [11.194615436370507]
We release the specification of a corpus of crossword puzzles collected from the New York Times daily crossword spanning 25 years.
These puzzles include a diverse set of clues: historic, factual, word meaning, synonyms/antonyms, fill-in-the-blank, abbreviations, prefixes/suffixes, wordplay, and cross-lingual.
arXiv Detail & Related papers (2022-05-20T21:16:44Z) - Natural language understanding for logical games [0.9594432031144714]
We developed a system able to automatically solve logical puzzles in natural language.
Our solution is composed by a and an inference module.
We also empower our software agent with the capability to provide Yes/No answers to natural language questions related to each puzzle.
arXiv Detail & Related papers (2021-10-01T17:36:14Z) - BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles [82.63394952538292]
We introduce BiRdQA, a bilingual multiple-choice question answering dataset with 6614 English riddles and 8751 Chinese riddles.
Existing monolingual and multilingual QA models fail to perform well on our dataset, indicating that there is a long way to go before machine can beat human on solving tricky riddles.
arXiv Detail & Related papers (2021-09-23T00:46:47Z) - Programming Puzzles [31.797853936252594]
We release an open-source dataset of Python Programming Puzzles (P3)
The puzzles are objective in that each one is specified entirely by the source code of its verifier $f$.
They do not require an answer key or input/output examples, nor do they depend on natural language understanding.
arXiv Detail & Related papers (2021-06-10T14:37:28Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - PuzzLing Machines: A Challenge on Learning From Small Data [64.513459448362]
We introduce a challenge on learning from small data, PuzzLing Machines, which consists of Rosetta Stone puzzles from Linguistic Olympiads for high school students.
Our challenge contains around 100 puzzles covering a wide range of linguistic phenomena from 81 languages.
We show that both simple statistical algorithms and state-of-the-art deep neural models perform inadequately on this challenge, as expected.
arXiv Detail & Related papers (2020-04-27T20:34:26Z) - VQA-LOL: Visual Question Answering under the Lens of Logic [58.30291671877342]
We investigate whether visual question answering systems trained to answer a question about an image, are able to answer the logical composition of multiple such questions.
We construct an augmentation of the VQA dataset as a benchmark, with questions containing logical compositions and linguistic transformations.
We propose our Lens of Logic (LOL) model which uses question-attention and logic-attention to understand logical connectives in the question, and a novel Fr'echet-Compatibility Loss.
arXiv Detail & Related papers (2020-02-19T17:57:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.