A Minimalist Dataset for Systematic Generalization of Perception,
Syntax, and Semantics
- URL: http://arxiv.org/abs/2103.01403v3
- Date: Tue, 18 Apr 2023 07:54:24 GMT
- Title: A Minimalist Dataset for Systematic Generalization of Perception,
Syntax, and Semantics
- Authors: Qing Li, Siyuan Huang, Yining Hong, Yixin Zhu, Ying Nian Wu, Song-Chun
Zhu
- Abstract summary: We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts.
In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images.
We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
- Score: 131.93113552146195
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inspired by humans' exceptional ability to master arithmetic and generalize
to new problems, we present a new dataset, Handwritten arithmetic with INTegers
(HINT), to examine machines' capability of learning generalizable concepts at
three levels: perception, syntax, and semantics. In HINT, machines are tasked
with learning how concepts are perceived from raw signals such as images (i.e.,
perception), how multiple concepts are structurally combined to form a valid
expression (i.e., syntax), and how concepts are realized to afford various
reasoning tasks (i.e., semantics), all in a weakly supervised manner. Focusing
on systematic generalization, we carefully design a five-fold test set to
evaluate both the interpolation and the extrapolation of learned concepts
w.r.t. the three levels. Further, we design a few-shot learning split to
determine whether or not models can rapidly learn new concepts and generalize
them to more complex scenarios. To comprehend existing models' limitations, we
undertake extensive experiments with various sequence-to-sequence models,
including RNNs, Transformers, and GPT-3 (with the chain of thought prompting).
The results indicate that current models struggle to extrapolate to long-range
syntactic dependency and semantics. Models exhibit a considerable gap toward
human-level generalization when evaluated with new concepts in a few-shot
setting. Moreover, we discover that it is infeasible to solve HINT by merely
scaling up the dataset and the model size; this strategy contributes little to
the extrapolation of syntax and semantics. Finally, in zero-shot GPT-3
experiments, the chain of thought prompting exhibits impressive results and
significantly boosts the test accuracy. We believe the HINT dataset and the
experimental findings are of great interest to the learning community on
systematic generalization.
Related papers
- Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - CAT: Interpretable Concept-based Taylor Additive Models [17.73885202930879]
Generalized Additive Models (GAMs) can explain deep neural networks (DNNs) at the feature level.
GAMs require large numbers of model parameters and are prone to overfitting, making them hard to train and scale.
We propose CAT, a novel interpretable Concept-bAsed Taylor additive model to simply this process.
arXiv Detail & Related papers (2024-06-25T20:43:15Z) - Sketch-Plan-Generalize: Continual Few-Shot Learning of Inductively Generalizable Spatial Concepts [6.932008652560561]
We seek a learning architecture that infers a succinct $program$ representation that explains the observed instance.
Our approach combines the benefits of the code generation ability of large language models along with grounded neural representations.
arXiv Detail & Related papers (2024-04-11T14:09:41Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Unsupervised discovery of Interpretable Visual Concepts [0.0]
We propose two methods to explain a model's decision, enhancing global interpretability.
One method is inspired by Occlusion and Sensitivity analysis (incorporating causality)
The other method uses a novel metric, called Class-aware Order Correlation (CaOC), to globally evaluate the most important image regions.
arXiv Detail & Related papers (2023-08-31T07:53:02Z) - FACT: Learning Governing Abstractions Behind Integer Sequences [7.895232155155041]
We introduce a novel view on the learning of concepts admitting complete finitary descriptions.
We lay down a set of benchmarking tasks aimed at conceptual understanding by machine learning models.
To further aid research in knowledge representation and reasoning, we present FACT, the Finitary Abstraction Toolkit.
arXiv Detail & Related papers (2022-09-20T08:20:03Z) - RelViT: Concept-guided Vision Transformer for Visual Relational
Reasoning [139.0548263507796]
We use vision transformers (ViTs) as our base model for visual reasoning.
We make better use of concepts defined as object entities and their relations to improve the reasoning ability of ViTs.
We show the resulting model, Concept-guided Vision Transformer (or RelViT for short), significantly outperforms prior approaches on HICO and GQA benchmarks.
arXiv Detail & Related papers (2022-04-24T02:46:43Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.