Related papers: Fast and flexible: Human program induction in abstract reasoning tasks

Fast and flexible: Human program induction in abstract reasoning tasks

URL: http://arxiv.org/abs/2103.05823v1
Date: Wed, 10 Mar 2021 02:18:21 GMT
Title: Fast and flexible: Human program induction in abstract reasoning tasks
Authors: Aysja Johnson, Wai Keen Vong, Brenden M. Lake, Todd M. Gureckis
Abstract summary: We report the first set of results collected from a behavioral study of humans solving a subset of tasks from ARC (40 out of 1000) Our results showed that humans were able to infer the underlying program and generate the correct test output for a novel test input example. Our findings suggest that people can quickly and reliably determine the relevant features and properties of a task to compose a correct solution.
Score: 14.24200473508597
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Abstraction and Reasoning Corpus (ARC) is a challenging program induction dataset that was recently proposed by Chollet (2019). Here, we report the first set of results collected from a behavioral study of humans solving a subset of tasks from ARC (40 out of 1000). Although this subset of tasks contains considerable variation, our results showed that humans were able to infer the underlying program and generate the correct test output for a novel test input example, with an average of 80% of tasks solved per participant, and with 65% of tasks being solved by more than 80% of participants. Additionally, we find interesting patterns of behavioral consistency and variability within the action sequences during the generation process, the natural language descriptions to describe the transformations for each task, and the errors people made. Our findings suggest that people can quickly and reliably determine the relevant features and properties of a task to compose a correct solution. Future modeling work could incorporate these findings, potentially by connecting the natural language descriptions we collected here to the underlying semantics of ARC.

Related papers

Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Reverse Probing: Evaluating Knowledge Transfer via Finetuned Task Embeddings for Coreference Resolution [23.375053899418504]
Instead of probing frozen representations from a complex source task, we explore the effectiveness of embeddings from multiple simple source tasks on a single target task.<n>Our findings reveal that task embeddings vary significantly in utility for coreference resolution, with semantic similarity tasks proving most beneficial.
arXiv Detail & Related papers (2025-01-31T17:12:53Z)
Explainable Procedural Mistake Detection [27.40806437649092]
Procedural mistake detection is a challenging sub-problem of classifying whether a human user has successfully executed the task at hand. We recast PMD to an explanatory self-dialog of questions and answers, which serve as evidence for a decision. Our results show that while open-source VLMs struggle with this task off-the-shelf, their accuracy, coherence, and dialog efficiency can be vastly improved.
arXiv Detail & Related papers (2024-12-16T16:13:55Z)
Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance. We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
Does the Order of Fine-tuning Matter and Why? [11.975836356680855]
We study the effect of fine-tuning multiple intermediate tasks and their ordering on target task performance. Experimental results show that there is an impact of task ordering on target task performance by up to 6% of performance gain and up to 4% of performance loss.
arXiv Detail & Related papers (2024-10-03T19:07:14Z)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z)
Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models [58.57279229066477]
We study how language models (LMs) solve retrieval tasks in diverse situations. We introduce ORION, a collection of structured retrieval tasks spanning six domains. We find that LMs internally decompose retrieval tasks in a modular way.
arXiv Detail & Related papers (2023-12-13T18:36:43Z)
Divergence-Based Domain Transferability for Zero-Shot Classification [78.55044112903148]
Transferring learned patterns from pretrained neural language models has been shown to significantly improve effectiveness across a variety of language-based tasks. Further tuning on intermediate tasks has been demonstrated to provide additional performance benefits, provided the intermediate task is sufficiently related to the target task. However, how to identify related tasks is an open problem, and brute-force searching effective task combinations is prohibitively expensive.
arXiv Detail & Related papers (2023-02-11T16:04:38Z)
Less is More: Summary of Long Instructions is Better for Program Synthesis [20.66688303609522]
We show that pre-trained language models (LMs) benefit from the summarized version of complicated questions. Our findings show that superfluous information often present in problem description does not help models in understanding a task. Experimental results on Codex show that our proposed approach outperforms baseline by 8.13% on an average in terms of strict accuracy.
arXiv Detail & Related papers (2022-03-16T13:04:12Z)
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver. It processes a variety of modalities and tasks with unified modeling and shared parameters. Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z)
An Integrated Dynamic Method for Allocating Roles and Planning Tasks for Mixed Human-Robot Teams [0.0]
This paper proposes a novel integrated dynamic method for planning and allocating tasks in mixed human robot teams. The Behavior Tree formulation allows encoding a single job as a compound of different tasks with temporal and logic constraints.
arXiv Detail & Related papers (2021-05-25T16:10:30Z)
Sequential Transfer in Reinforcement Learning with a Generative Model [48.40219742217783]
We show how to reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones. We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge. We empirically verify our theoretical findings in simple simulated domains.
arXiv Detail & Related papers (2020-07-01T19:53:35Z)
Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance? [27.64235687067883]
We show that models can learn to encode linguistic properties even if they are not needed for the task on which the model was trained. We demonstrate models can encode these properties considerably above chance-level even when distributed in the data as random noise.
arXiv Detail & Related papers (2020-05-02T06:19:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.