Are Deep Neural Networks SMARTer than Second Graders?
- URL: http://arxiv.org/abs/2212.09993v6
- Date: Mon, 11 Sep 2023 13:58:44 GMT
- Title: Are Deep Neural Networks SMARTer than Second Graders?
- Authors: Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Kevin A. Smith, Joshua B.
Tenenbaum
- Abstract summary: We evaluate the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed for children in the 6--8 age group.
Our dataset consists of 101 unique puzzles; each puzzle comprises a picture question, and their solution needs a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning.
Experiments reveal that while powerful deep models offer reasonable performances on puzzles in a supervised setting, they are not better than random accuracy when analyzed for generalization.
- Score: 85.60342335636341
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent times have witnessed an increasing number of applications of deep
neural networks towards solving tasks that require superior cognitive
abilities, e.g., playing Go, generating art, ChatGPT, etc. Such a dramatic
progress raises the question: how generalizable are neural networks in solving
problems that demand broad skills? To answer this question, we propose SMART: a
Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101
dataset, for evaluating the abstraction, deduction, and generalization
abilities of neural networks in solving visuo-linguistic puzzles designed
specifically for children in the 6--8 age group. Our dataset consists of 101
unique puzzles; each puzzle comprises a picture and a question, and their
solution needs a mix of several elementary skills, including arithmetic,
algebra, and spatial reasoning, among others. To scale our dataset towards
training deep neural networks, we programmatically generate entirely new
instances for each puzzle, while retaining their solution algorithm. To
benchmark performances on SMART-101, we propose a vision and language
meta-learning model using varied state-of-the-art backbones. Our experiments
reveal that while powerful deep models offer reasonable performances on puzzles
in a supervised setting, they are not better than random accuracy when analyzed
for generalization. We also evaluate the recent ChatGPT and other large
language models on a subset of SMART-101 and find that while these models show
convincing reasoning abilities, the answers are often incorrect.
Related papers
- Neural networks for abstraction and reasoning: Towards broad
generalization in machines [3.165509887826658]
We look at novel approaches for solving the Abstraction & Reasoning Corpus (ARC)
We adapt the DreamCoder neurosymbolic reasoning solver to ARC.
We present the Perceptual Abstraction and Reasoning Language (PeARL) language, which allows DreamCoder to solve ARC tasks.
We publish the arckit Python library to make future research on ARC easier.
arXiv Detail & Related papers (2024-02-05T20:48:57Z) - Bridging Logic and Learning: A Neural-Symbolic Approach for Enhanced
Reasoning in Neural Models (ASPER) [0.13053649021965597]
This paper introduces an approach designed to improve the performance of neural models in learning reasoning tasks.
It achieves this by integrating Answer Set Programming solvers and domain-specific expertise.
The model shows a significant improvement in solving Sudoku puzzles using only 12 puzzles for training and testing.
arXiv Detail & Related papers (2023-12-18T19:06:00Z) - The Clock and the Pizza: Two Stories in Mechanistic Explanation of
Neural Networks [59.26515696183751]
We show that algorithm discovery in neural networks is sometimes more complex.
We show that even simple learning problems can admit a surprising diversity of solutions.
arXiv Detail & Related papers (2023-06-30T17:59:13Z) - Pointer Value Retrieval: A new benchmark for understanding the limits of
neural network generalization [40.21297628440919]
We introduce a novel benchmark, Pointer Value Retrieval (PVR) tasks, that explore the limits of neural network generalization.
PVR tasks can consist of visual as well as symbolic inputs, each with varying levels of difficulty.
We demonstrate that this task structure provides a rich testbed for understanding generalization.
arXiv Detail & Related papers (2021-07-27T03:50:31Z) - Thinking Deeply with Recurrence: Generalizing from Easy to Hard
Sequential Reasoning Problems [51.132938969015825]
We observe that recurrent networks have the uncanny ability to closely emulate the behavior of non-recurrent deep models.
We show that recurrent networks that are trained to solve simple mazes with few recurrent steps can indeed solve much more complex problems simply by performing additional recurrences during inference.
arXiv Detail & Related papers (2021-02-22T14:09:20Z) - SMART: A Situation Model for Algebra Story Problems via Attributed
Grammar [74.1315776256292]
We introduce the concept of a emphsituation model, which originates from psychology studies to represent the mental states of humans in problem-solving.
We show that the proposed model outperforms all previous neural solvers by a large margin while preserving much better interpretability.
arXiv Detail & Related papers (2020-12-27T21:03:40Z) - Characterizing the Weight Space for Different Learning Models [0.0]
Deep Learning has become one of the primary research areas in developing intelligent machines.
This paper attempts to characterize the solution space of a deep neural network in terms of three different subsets.
We show that adversarial attacks are generally less successful against Associative Memory Models than Deep Neural Networks.
arXiv Detail & Related papers (2020-06-04T09:30:29Z) - PuzzLing Machines: A Challenge on Learning From Small Data [64.513459448362]
We introduce a challenge on learning from small data, PuzzLing Machines, which consists of Rosetta Stone puzzles from Linguistic Olympiads for high school students.
Our challenge contains around 100 puzzles covering a wide range of linguistic phenomena from 81 languages.
We show that both simple statistical algorithms and state-of-the-art deep neural models perform inadequately on this challenge, as expected.
arXiv Detail & Related papers (2020-04-27T20:34:26Z) - Machine Number Sense: A Dataset of Visual Arithmetic Problems for
Abstract and Relational Reasoning [95.18337034090648]
We propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG)
These visual arithmetic problems are in the form of geometric figures.
We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task.
arXiv Detail & Related papers (2020-04-25T17:14:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.