Abstract Visual Reasoning Enabled by Language
- URL: http://arxiv.org/abs/2303.04091v3
- Date: Thu, 22 Jun 2023 10:41:41 GMT
- Title: Abstract Visual Reasoning Enabled by Language
- Authors: Giacomo Camposampiero, Loic Houmard, Benjamin Estermann, Jo\"el
Mathys, Roger Wattenhofer
- Abstract summary: We propose a general learning-based framework for solving ARC.
It is centered on transforming tasks from the vision to the language domain.
This composition of language and vision allows for pre-trained models to be leveraged at each stage.
- Score: 8.627180519837657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While artificial intelligence (AI) models have achieved human or even
superhuman performance in many well-defined applications, they still struggle
to show signs of broad and flexible intelligence. The Abstraction and Reasoning
Corpus (ARC), a visual intelligence benchmark introduced by Fran\c{c}ois
Chollet, aims to assess how close AI systems are to human-like cognitive
abilities. Most current approaches rely on carefully handcrafted
domain-specific program searches to brute-force solutions for the tasks present
in ARC. In this work, we propose a general learning-based framework for solving
ARC. It is centered on transforming tasks from the vision to the language
domain. This composition of language and vision allows for pre-trained models
to be leveraged at each stage, enabling a shift from handcrafted priors towards
the learned priors of the models. While not yet beating state-of-the-art models
on ARC, we demonstrate the potential of our approach, for instance, by solving
some ARC tasks that have not been solved previously.
Related papers
- VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning [86.59849798539312]
We present Neuro-Symbolic Predicates, a first-order abstraction language that combines the strengths of symbolic and neural knowledge representations.
We show that our approach offers better sample complexity, stronger out-of-distribution generalization, and improved interpretability.
arXiv Detail & Related papers (2024-10-30T16:11:05Z) - MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting [97.52388851329667]
We introduce Marking Open-world Keypoint Affordances (MOKA) to solve robotic manipulation tasks specified by free-form language instructions.
Central to our approach is a compact point-based representation of affordance, which bridges the VLM's predictions on observed images and the robot's actions in the physical world.
We evaluate and analyze MOKA's performance on various table-top manipulation tasks including tool use, deformable body manipulation, and object rearrangement.
arXiv Detail & Related papers (2024-03-05T18:08:45Z) - Neural networks for abstraction and reasoning: Towards broad
generalization in machines [3.165509887826658]
We look at novel approaches for solving the Abstraction & Reasoning Corpus (ARC)
We adapt the DreamCoder neurosymbolic reasoning solver to ARC.
We present the Perceptual Abstraction and Reasoning Language (PeARL) language, which allows DreamCoder to solve ARC tasks.
We publish the arckit Python library to make future research on ARC easier.
arXiv Detail & Related papers (2024-02-05T20:48:57Z) - A Survey on Robotics with Foundation Models: toward Embodied AI [30.999414445286757]
Recent advances in computer vision, natural language processing, and multi-modality learning have shown that the foundation models have superhuman capabilities for specific tasks.
This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control.
arXiv Detail & Related papers (2024-02-04T07:55:01Z) - Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic
Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components.
We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z) - Towards A Unified Agent with Foundation Models [18.558328028366816]
We investigate how to embed and leverage such abilities in Reinforcement Learning (RL) agents.
We design a framework that uses language as the core reasoning tool, exploring how this enables an agent to tackle a series of fundamental RL challenges.
We demonstrate substantial performance improvements over baselines in exploration efficiency and ability to reuse data from offline datasets.
arXiv Detail & Related papers (2023-07-18T22:37:30Z) - Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from
Offline Data [101.43350024175157]
Self-supervised learning has the potential to decrease the amount of human annotation and engineering effort required to learn control strategies.
Our work builds on prior work showing that the reinforcement learning (RL) itself can be cast as a self-supervised problem.
We demonstrate that a self-supervised RL algorithm based on contrastive learning can solve real-world, image-based robotic manipulation tasks.
arXiv Detail & Related papers (2023-06-06T01:36:56Z) - The ConceptARC Benchmark: Evaluating Understanding and Generalization in
the ARC Domain [0.0]
We describe an in-depth evaluation benchmark for the Abstraction and Reasoning Corpus (ARC)
In particular, we describe ConceptARC, a new, publicly available benchmark in the ARC domain.
We report results on testing humans on this benchmark as well as three machine solvers.
arXiv Detail & Related papers (2023-05-11T21:06:39Z) - OpenAGI: When LLM Meets Domain Experts [51.86179657467822]
Human Intelligence (HI) excels at combining basic skills to solve complex tasks.
This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents.
We introduce OpenAGI, an open-source platform designed for solving multi-step, real-world tasks.
arXiv Detail & Related papers (2023-04-10T03:55:35Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.