A Unified View of Abstract Visual Reasoning Problems
- URL: http://arxiv.org/abs/2406.11068v1
- Date: Sun, 16 Jun 2024 20:52:44 GMT
- Title: A Unified View of Abstract Visual Reasoning Problems
- Authors: Mikołaj Małkiński, Jacek Mańdziuk,
- Abstract summary: We introduce a unified view of tasks, where each instance is rendered as a single image with no priori assumptions about the number of panels, their location, or role.
The main advantage of the proposed unified view is the ability to develop universal learning models applicable to various tasks.
Experiments conducted on four datasets with Raven's Progressive Matrices and Visual Analogy Problems show that the proposed unified representation of tasks poses a challenge to state-of-the-art Deep Learning (DL) models and, more broadly, contemporary DL image recognition methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The field of Abstract Visual Reasoning (AVR) encompasses a wide range of problems, many of which are inspired by human IQ tests. The variety of AVR tasks has resulted in state-of-the-art AVR methods being task-specific approaches. Furthermore, contemporary methods consider each AVR problem instance not as a whole, but in the form of a set of individual panels with particular locations and roles (context vs. answer panels) pre-assigned according to the task-specific arrangements. While these highly specialized approaches have recently led to significant progress in solving particular AVR tasks, considering each task in isolation hinders the development of universal learning systems in this domain. In this paper, we introduce a unified view of AVR tasks, where each problem instance is rendered as a single image, with no a priori assumptions about the number of panels, their location, or role. The main advantage of the proposed unified view is the ability to develop universal learning models applicable to various AVR tasks. What is more, the proposed approach inherently facilitates transfer learning in the AVR domain, as various types of problems share a common representation. The experiments conducted on four AVR datasets with Raven's Progressive Matrices and Visual Analogy Problems, and one real-world visual analogy dataset show that the proposed unified representation of AVR tasks poses a challenge to state-of-the-art Deep Learning (DL) AVR models and, more broadly, contemporary DL image recognition methods. In order to address this challenge, we introduce the Unified Model for Abstract Visual Reasoning (UMAVR) capable of dealing with various types of AVR problems in a unified manner. UMAVR outperforms existing AVR methods in selected single-task learning experiments, and demonstrates effective knowledge reuse in transfer learning and curriculum learning setups.
Related papers
- Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs [73.74375912785689]
This paper proposes unified training strategies for speech recognition systems.
We demonstrate that training a single model for all three tasks enhances VSR and AVSR performance.
We also introduce a greedy pseudo-labelling approach to more effectively leverage unlabelled samples.
arXiv Detail & Related papers (2024-11-04T16:46:53Z) - A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends [67.43992456058541]
Image restoration (IR) refers to the process of improving visual quality of images while removing degradation, such as noise, blur, weather effects, and so on.
Traditional IR methods typically target specific types of degradation, which limits their effectiveness in real-world scenarios with complex distortions.
The all-in-one image restoration (AiOIR) paradigm has emerged, offering a unified framework that adeptly addresses multiple degradation types.
arXiv Detail & Related papers (2024-10-19T11:11:09Z) - Deep Learning for Video Anomaly Detection: A Review [52.74513211976795]
Video anomaly detection (VAD) aims to discover behaviors or events deviating from the normality in videos.
In the era of deep learning, a great variety of deep learning based methods are constantly emerging for the VAD task.
This review covers the spectrum of five different categories, namely, semi-supervised, weakly supervised, fully supervised, unsupervised and open-set supervised VAD.
arXiv Detail & Related papers (2024-09-09T07:31:16Z) - URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering [28.776476995363048]
We propose a novel Unified and Representation Learning for Incomplete Multi-View Clustering (URRL-IMVC)
URRL-IMVC directly learns a unified embedding that is robust to view missing conditions by integrating information from multiple views and neighboring samples.
We extensively evaluate the proposed URRL-IMVC framework on various benchmark datasets, demonstrating its state-of-the-art performance.
arXiv Detail & Related papers (2024-07-12T09:35:25Z) - One Self-Configurable Model to Solve Many Abstract Visual Reasoning
Problems [0.0]
We propose a unified model for solving Single-Choice Abstract visual Reasoning tasks.
The proposed model relies on SCAR-Aware dynamic Layer (SAL), which adapts its weights to the structure of the problem.
Experiments show thatSAL-based models, in general, effectively solves diverse tasks, and its performance is on par with the state-of-the-art task-specific baselines.
arXiv Detail & Related papers (2023-12-15T18:15:20Z) - XVO: Generalized Visual Odometry via Cross-Modal Self-Training [11.70220331540621]
XVO is a semi-supervised learning method for training generalized monocular Visual Odometry (VO) models.
In contrast to standard monocular VO approaches which often study a known calibration within a single dataset, XVO efficiently learns to recover relative pose with real-world scale.
We optimize the motion estimation model via self-training from large amounts of unconstrained and heterogeneous dash camera videos available on YouTube.
arXiv Detail & Related papers (2023-09-28T18:09:40Z) - Multi-View Class Incremental Learning [57.14644913531313]
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance.
This paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views.
arXiv Detail & Related papers (2023-06-16T08:13:41Z) - Assessor360: Multi-sequence Network for Blind Omnidirectional Image
Quality Assessment [50.82681686110528]
Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs)
The quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipeline lacks the modeling of the observer's browsing process.
We propose a novel multi-sequence network for BOIQA called Assessor360, which is derived from the realistic multi-assessor ODI quality assessment procedure.
arXiv Detail & Related papers (2023-05-18T13:55:28Z) - A Review of Emerging Research Directions in Abstract Visual Reasoning [0.0]
We propose a taxonomy to categorise the tasks along 5 dimensions: input shapes, hidden rules, target task, cognitive function, and main challenge.
The perspective taken in this survey allows to characterise problems with respect to their shared and distinct properties, provides a unified view on the existing approaches for solving tasks.
One of them refers to the observation that in the machine learning literature different tasks are considered in isolation, which is in the stark contrast with the way the tasks are used to measure human intelligence.
arXiv Detail & Related papers (2022-02-21T14:58:02Z) - Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's
Progressive Matrices [0.0]
We focus on the most common type of tasks -- the Raven's Progressive Matrices ( RPMs) -- and provide a review of the learning methods and deep neural models applied to solve RPMs.
We conclude the paper by demonstrating how real-world problems can benefit from the discoveries of RPM studies.
arXiv Detail & Related papers (2022-01-28T19:24:30Z) - Collaborative Attention Mechanism for Multi-View Action Recognition [75.33062629093054]
We propose a collaborative attention mechanism (CAM) for solving the multi-view action recognition problem.
The proposed CAM detects the attention differences among multi-view, and adaptively integrates frame-level information to benefit each other.
Experiments on four action datasets illustrate the proposed CAM achieves better results for each view and also boosts multi-view performance.
arXiv Detail & Related papers (2020-09-14T17:33:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.