PuzzLing Machines: A Challenge on Learning From Small Data
- URL: http://arxiv.org/abs/2004.13161v1
- Date: Mon, 27 Apr 2020 20:34:26 GMT
- Title: PuzzLing Machines: A Challenge on Learning From Small Data
- Authors: G\"ozde G\"ul \c{S}ahin, Yova Kementchedjhieva, Phillip Rust, Iryna
Gurevych
- Abstract summary: We introduce a challenge on learning from small data, PuzzLing Machines, which consists of Rosetta Stone puzzles from Linguistic Olympiads for high school students.
Our challenge contains around 100 puzzles covering a wide range of linguistic phenomena from 81 languages.
We show that both simple statistical algorithms and state-of-the-art deep neural models perform inadequately on this challenge, as expected.
- Score: 64.513459448362
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep neural models have repeatedly proved excellent at memorizing surface
patterns from large datasets for various ML and NLP benchmarks. They struggle
to achieve human-like thinking, however, because they lack the skill of
iterative reasoning upon knowledge. To expose this problem in a new light, we
introduce a challenge on learning from small data, PuzzLing Machines, which
consists of Rosetta Stone puzzles from Linguistic Olympiads for high school
students. These puzzles are carefully designed to contain only the minimal
amount of parallel text necessary to deduce the form of unseen expressions.
Solving them does not require external information (e.g., knowledge bases,
visual signals) or linguistic expertise, but meta-linguistic awareness and
deductive skills. Our challenge contains around 100 puzzles covering a wide
range of linguistic phenomena from 81 languages. We show that both simple
statistical algorithms and state-of-the-art deep neural models perform
inadequately on this challenge, as expected. We hope that this benchmark,
available at https://ukplab.github.io/PuzzLing-Machines/, inspires further
efforts towards a new paradigm in NLP---one that is grounded in human-like
reasoning and understanding.
Related papers
- On Memorization of Large Language Models in Logical Reasoning [70.94164038947078]
Large language models (LLMs) achieve good performance on challenging reasoning benchmarks, yet could also make basic reasoning mistakes.
One hypothesis is that the increasingly high and nearly saturated performance could be due to the memorization of similar problems.
We show that fine-tuning leads to heavy memorization, but it also consistently improves generalization performance.
arXiv Detail & Related papers (2024-10-30T15:31:54Z) - modeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models [23.105555180223487]
modeLing is a novel benchmark of Linguistics Olympiad-style puzzles which tests few-shot reasoning in AI systems.
We evaluate several large open source language models and GPT on our benchmark.
arXiv Detail & Related papers (2024-06-24T18:00:59Z) - Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious
Challenges in Multimodal Reasoning [24.386388107656334]
This paper introduces the novel task of multimodal puzzle solving, framed within the context of visual question-answering.
We present a new dataset, AlgoVQA, designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles.
arXiv Detail & Related papers (2024-03-06T17:15:04Z) - Are Deep Neural Networks SMARTer than Second Graders? [85.60342335636341]
We evaluate the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed for children in the 6--8 age group.
Our dataset consists of 101 unique puzzles; each puzzle comprises a picture question, and their solution needs a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning.
Experiments reveal that while powerful deep models offer reasonable performances on puzzles in a supervised setting, they are not better than random accuracy when analyzed for generalization.
arXiv Detail & Related papers (2022-12-20T04:33:32Z) - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational
Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering.
Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z) - Pushing the Limits of Rule Reasoning in Transformers through Natural
Language Satisfiability [30.01308882849197]
We propose a new methodology for creating challenging algorithmic reasoning datasets.
Key idea is to draw insights from empirical sampling of hard propositional SAT problems and from complexity-theoretic studies of language.
We find that current transformers, given sufficient training data, are surprisingly robust at solving the resulting NLSat problems.
arXiv Detail & Related papers (2021-12-16T17:47:20Z) - Understanding and Enhancing the Use of Context for Machine Translation [2.367786892039871]
This thesis focuses on understanding certain potentials of contexts in neural models and design augmentation models to benefit from them.
To translate from a source language to a target language, a neural model has to understand the meaning of constituents in the provided context.
Looking more in-depth into the role of context and the impact of data on learning models is essential to advance the NLP field.
arXiv Detail & Related papers (2021-02-20T20:19:27Z) - Machine Number Sense: A Dataset of Visual Arithmetic Problems for
Abstract and Relational Reasoning [95.18337034090648]
We propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG)
These visual arithmetic problems are in the form of geometric figures.
We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task.
arXiv Detail & Related papers (2020-04-25T17:14:58Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.