GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning
- URL: http://arxiv.org/abs/2505.20672v1
- Date: Tue, 27 May 2025 03:42:51 GMT
- Title: GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning
- Authors: Woochang Sim, Hyunseok Ryu, Kyungmin Choi, Sungwon Han, Sundong Kim,
- Abstract summary: State-of-the-art models still achieve accuracy rates of merely 40-55% on 2024 ARC Competition.<n>We introduce an analogy-inspired ARC dataset, GIFARC.<n>GIFARC guides AI agents to evaluate the task analogically before engaging in brute-force pattern search.
- Score: 7.09254962218677
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Abstraction and Reasoning Corpus (ARC) poses a stringent test of general AI capabilities, requiring solvers to infer abstract patterns from only a handful of examples. Despite substantial progress in deep learning, state-of-the-art models still achieve accuracy rates of merely 40-55% on 2024 ARC Competition, indicative of a significant gap between their performance and human-level reasoning. In this work, we seek to bridge that gap by introducing an analogy-inspired ARC dataset, GIFARC. Leveraging large language models (LLMs) and vision-language models (VLMs), we synthesize new ARC-style tasks from a variety of GIF images that include analogies. Each new task is paired with ground-truth analogy, providing an explicit mapping between visual transformations and everyday concepts. By embedding robust human-intuitive analogies into ARC-style tasks, GIFARC guides AI agents to evaluate the task analogically before engaging in brute-force pattern search, thus efficiently reducing problem complexity and build a more concise and human-understandable solution. We empirically validate that guiding LLM with analogic approach with GIFARC affects task-solving approaches of LLMs to align with analogic approach of human.
Related papers
- Ambiguity-Aware and High-Order Relation Learning for Multi-Grained Image-Text Matching [6.633576185707164]
This paper proposes the Ambiguity-Aware and High-order Relation learning framework (AAHR) to address these issues.<n>The framework introduces global and local feature extraction mechanisms and an adaptive aggregation network, significantly enhancing full-grained semantic understanding capabilities.<n> Experimental results demonstrate that AAHR outperforms existing state-of-the-art methods on Flickr30K, MSCOCO, and ECCV Caption datasets.
arXiv Detail & Related papers (2025-07-12T11:30:32Z) - Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities.<n>We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details.<n>We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z) - VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning [63.0285363282581]
Multimodal Large Language Models (MLLMs) have become a powerful tool for integrating visual and textual information.<n>We introduce VOILA, a benchmark designed to evaluate MLLMs' perceptual understanding and abstract relational reasoning.<n>We reveal that current MLLMs struggle to comprehend inter-image relationships and exhibit limited capabilities in high-level relational reasoning.
arXiv Detail & Related papers (2025-02-25T23:36:19Z) - Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects [31.926206783846144]
We show that a Vision Transformer (ViT) fails dramatically on most ARC tasks even when trained on one million examples per task.
We propose ViTARC, a ViT-style architecture that unlocks some of the visual reasoning capabilities required by the ARC.
Our task-specific ViTARC models achieve a test solve rate close to 100% on more than half of the 400 public ARC tasks.
arXiv Detail & Related papers (2024-10-08T22:25:34Z) - DiffuSyn Bench: Evaluating Vision-Language Models on Real-World Complexities with Diffusion-Generated Synthetic Benchmarks [0.0]
This study assesses the ability of Large Vision-Language Models (LVLMs) to differentiate between AI-generated and human-generated images.
It introduces a new automated benchmark construction method for this evaluation.
arXiv Detail & Related papers (2024-06-06T19:50:33Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and
the Importance of Object-based Representations [50.431003245201644]
We show that GPT-4 is unable to "reason" perfectly within non-language domains such as the 1D-ARC or a simple ARC subset.
We propose an object-based representation that is obtained through an external tool, resulting in nearly doubling the performance on solved ARC tasks and near-perfect scores on the easier 1D-ARC.
arXiv Detail & Related papers (2023-05-26T16:32:17Z) - Solving morphological analogies: from retrieval to generation [4.834203844100681]
Analogical inference is a capability of human reasoning, and has been used to solve hard reasoning tasks.
We propose a deep learning (DL) framework to address and tackle two key tasks in AR: analogy detection and solving.
The framework is thoroughly tested on the Siganalogies dataset of morphological analogical proportions (APs) between words, and shown to outperform symbolic approaches in many languages.
arXiv Detail & Related papers (2023-03-30T12:36:46Z) - Abstract Visual Reasoning Enabled by Language [8.627180519837657]
We propose a general learning-based framework for solving ARC.
It is centered on transforming tasks from the vision to the language domain.
This composition of language and vision allows for pre-trained models to be leveraged at each stage.
arXiv Detail & Related papers (2023-03-07T17:52:46Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.