Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
- URL: http://arxiv.org/abs/2507.11761v1
- Date: Tue, 15 Jul 2025 21:54:51 GMT
- Title: Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
- Authors: Fan Shi, Bin Li, Xiangyang Xue,
- Abstract summary: Abstract visual reasoning (AVR) enables humans to quickly discover and generalize abstract rules to new scenarios.<n>This paper proposes a novel Unified Conditional Generative Solver (UCGS) to address multiple tasks in a unified framework.<n>UCGS exhibits the ability of zero-shot reasoning, enabling it to perform abstract reasoning on problems from unseen tasks in the testing phase.
- Score: 52.107043437362556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Abstract visual reasoning (AVR) enables humans to quickly discover and generalize abstract rules to new scenarios. Designing intelligent systems with human-like AVR abilities has been a long-standing topic in the artificial intelligence community. Deep AVR solvers have recently achieved remarkable success in various AVR tasks. However, they usually use task-specific designs or parameters in different tasks. In such a paradigm, solving new tasks often means retraining the model, and sometimes retuning the model architectures, which increases the cost of solving AVR problems. In contrast to task-specific approaches, this paper proposes a novel Unified Conditional Generative Solver (UCGS), aiming to address multiple AVR tasks in a unified framework. First, we prove that some well-known AVR tasks can be reformulated as the problem of estimating the predictability of target images in problem panels. Then, we illustrate that, under the proposed framework, training one conditional generative model can solve various AVR tasks. The experiments show that with a single round of multi-task training, UCGS demonstrates abstract reasoning ability across various AVR tasks. Especially, UCGS exhibits the ability of zero-shot reasoning, enabling it to perform abstract reasoning on problems from unseen AVR tasks in the testing phase.
Related papers
- Absolute Zero: Reinforced Self-play Reasoning with Zero Data [61.46462130246158]
Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models.<n>We introduce the Absolute Zero Reasoner (AZR), a system that self-evolves its training curriculum and reasoning ability.<n>AZR achieves overall SOTA performance on coding and mathematical reasoning tasks, outperforming existing zero-setting models.
arXiv Detail & Related papers (2025-05-06T09:08:00Z) - On Data Synthesis and Post-training for Visual Abstract Reasoning [15.055924556135857]
We make a common LLaVANeXT 7B model capable of perceiving and reasoning about specific problems.<n>This is a great breakthrough since almost all previous VLMs fail or show nearly random performance on representative benchmarks.
arXiv Detail & Related papers (2025-04-02T03:18:24Z) - DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning [57.285435980459205]
compositional visual reasoning approaches have shown promise as more effective strategies than end-to-end VR methods.<n>We propose DWIM: Discrepancy-aware training generation, which assesses tool usage and extracts more viable for training.<n>Instruct-Masking fine-tuning, which guides the model to only clone effective actions, enabling the generation of more practical solutions.
arXiv Detail & Related papers (2025-03-25T01:57:59Z) - RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception [64.80760846124858]
This paper proposes a novel unified representation, RepVF, which harmonizes the representation of various perception tasks.
RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model.
Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks.
arXiv Detail & Related papers (2024-07-15T16:25:07Z) - A Unified View of Abstract Visual Reasoning Problems [0.0]
We introduce a unified view of tasks, where each instance is rendered as a single image with no priori assumptions about the number of panels, their location, or role.
The main advantage of the proposed unified view is the ability to develop universal learning models applicable to various tasks.
Experiments conducted on four datasets with Raven's Progressive Matrices and Visual Analogy Problems show that the proposed unified representation of tasks poses a challenge to state-of-the-art Deep Learning (DL) models and, more broadly, contemporary DL image recognition methods.
arXiv Detail & Related papers (2024-06-16T20:52:44Z) - One Self-Configurable Model to Solve Many Abstract Visual Reasoning
Problems [0.0]
We propose a unified model for solving Single-Choice Abstract visual Reasoning tasks.
The proposed model relies on SCAR-Aware dynamic Layer (SAL), which adapts its weights to the structure of the problem.
Experiments show thatSAL-based models, in general, effectively solves diverse tasks, and its performance is on par with the state-of-the-art task-specific baselines.
arXiv Detail & Related papers (2023-12-15T18:15:20Z) - Assessor360: Multi-sequence Network for Blind Omnidirectional Image
Quality Assessment [50.82681686110528]
Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs)
The quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipeline lacks the modeling of the observer's browsing process.
We propose a novel multi-sequence network for BOIQA called Assessor360, which is derived from the realistic multi-assessor ODI quality assessment procedure.
arXiv Detail & Related papers (2023-05-18T13:55:28Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z) - A Review of Emerging Research Directions in Abstract Visual Reasoning [0.0]
We propose a taxonomy to categorise the tasks along 5 dimensions: input shapes, hidden rules, target task, cognitive function, and main challenge.
The perspective taken in this survey allows to characterise problems with respect to their shared and distinct properties, provides a unified view on the existing approaches for solving tasks.
One of them refers to the observation that in the machine learning literature different tasks are considered in isolation, which is in the stark contrast with the way the tasks are used to measure human intelligence.
arXiv Detail & Related papers (2022-02-21T14:58:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.