Fast and flexible: Human program induction in abstract reasoning tasks
- URL: http://arxiv.org/abs/2103.05823v1
- Date: Wed, 10 Mar 2021 02:18:21 GMT
- Title: Fast and flexible: Human program induction in abstract reasoning tasks
- Authors: Aysja Johnson, Wai Keen Vong, Brenden M. Lake, Todd M. Gureckis
- Abstract summary: We report the first set of results collected from a behavioral study of humans solving a subset of tasks from ARC (40 out of 1000)
Our results showed that humans were able to infer the underlying program and generate the correct test output for a novel test input example.
Our findings suggest that people can quickly and reliably determine the relevant features and properties of a task to compose a correct solution.
- Score: 14.24200473508597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Abstraction and Reasoning Corpus (ARC) is a challenging program induction
dataset that was recently proposed by Chollet (2019). Here, we report the first
set of results collected from a behavioral study of humans solving a subset of
tasks from ARC (40 out of 1000). Although this subset of tasks contains
considerable variation, our results showed that humans were able to infer the
underlying program and generate the correct test output for a novel test input
example, with an average of 80% of tasks solved per participant, and with 65%
of tasks being solved by more than 80% of participants. Additionally, we find
interesting patterns of behavioral consistency and variability within the
action sequences during the generation process, the natural language
descriptions to describe the transformations for each task, and the errors
people made. Our findings suggest that people can quickly and reliably
determine the relevant features and properties of a task to compose a correct
solution. Future modeling work could incorporate these findings, potentially by
connecting the natural language descriptions we collected here to the
underlying semantics of ARC.
Related papers
- Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Look Before You Leap: A Universal Emergent Decomposition of Retrieval
Tasks in Language Models [58.57279229066477]
We study how language models (LMs) solve retrieval tasks in diverse situations.
We introduce ORION, a collection of structured retrieval tasks spanning six domains.
We find that LMs internally decompose retrieval tasks in a modular way.
arXiv Detail & Related papers (2023-12-13T18:36:43Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - Post Hoc Explanations of Language Models Can Improve Language Models [43.2109029463221]
We present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY)
We leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions.
Our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks.
arXiv Detail & Related papers (2023-05-19T04:46:04Z) - Divergence-Based Domain Transferability for Zero-Shot Classification [78.55044112903148]
Transferring learned patterns from pretrained neural language models has been shown to significantly improve effectiveness across a variety of language-based tasks.
Further tuning on intermediate tasks has been demonstrated to provide additional performance benefits, provided the intermediate task is sufficiently related to the target task.
However, how to identify related tasks is an open problem, and brute-force searching effective task combinations is prohibitively expensive.
arXiv Detail & Related papers (2023-02-11T16:04:38Z) - Less is More: Summary of Long Instructions is Better for Program
Synthesis [20.66688303609522]
We show that pre-trained language models (LMs) benefit from the summarized version of complicated questions.
Our findings show that superfluous information often present in problem description does not help models in understanding a task.
Experimental results on Codex show that our proposed approach outperforms baseline by 8.13% on an average in terms of strict accuracy.
arXiv Detail & Related papers (2022-03-16T13:04:12Z) - Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver.
It processes a variety of modalities and tasks with unified modeling and shared parameters.
Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z) - An Integrated Dynamic Method for Allocating Roles and Planning Tasks for
Mixed Human-Robot Teams [0.0]
This paper proposes a novel integrated dynamic method for planning and allocating tasks in mixed human robot teams.
The Behavior Tree formulation allows encoding a single job as a compound of different tasks with temporal and logic constraints.
arXiv Detail & Related papers (2021-05-25T16:10:30Z) - Sequential Transfer in Reinforcement Learning with a Generative Model [48.40219742217783]
We show how to reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.
We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge.
We empirically verify our theoretical findings in simple simulated domains.
arXiv Detail & Related papers (2020-07-01T19:53:35Z) - Probing the Probing Paradigm: Does Probing Accuracy Entail Task
Relevance? [27.64235687067883]
We show that models can learn to encode linguistic properties even if they are not needed for the task on which the model was trained.
We demonstrate models can encode these properties considerably above chance-level even when distributed in the data as random noise.
arXiv Detail & Related papers (2020-05-02T06:19:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.