One Self-Configurable Model to Solve Many Abstract Visual Reasoning
Problems
- URL: http://arxiv.org/abs/2312.09997v1
- Date: Fri, 15 Dec 2023 18:15:20 GMT
- Title: One Self-Configurable Model to Solve Many Abstract Visual Reasoning
Problems
- Authors: Miko{\l}aj Ma{\l}ki\'nski, Jacek Ma\'ndziuk
- Abstract summary: We propose a unified model for solving Single-Choice Abstract visual Reasoning tasks.
The proposed model relies on SCAR-Aware dynamic Layer (SAL), which adapts its weights to the structure of the problem.
Experiments show thatSAL-based models, in general, effectively solves diverse tasks, and its performance is on par with the state-of-the-art task-specific baselines.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Abstract Visual Reasoning (AVR) comprises a wide selection of various
problems similar to those used in human IQ tests. Recent years have brought
dynamic progress in solving particular AVR tasks, however, in the contemporary
literature AVR problems are largely dealt with in isolation, leading to highly
specialized task-specific methods. With the aim of developing universal
learning systems in the AVR domain, we propose the unified model for solving
Single-Choice Abstract visual Reasoning tasks (SCAR), capable of solving
various single-choice AVR tasks, without making any a priori assumptions about
the task structure, in particular the number and location of panels. The
proposed model relies on a novel Structure-Aware dynamic Layer (SAL), which
adapts its weights to the structure of the considered AVR problem. Experiments
conducted on Raven's Progressive Matrices, Visual Analogy Problems, and Odd One
Out problems show that SCAR (SAL-based models, in general) effectively solves
diverse AVR tasks, and its performance is on par with the state-of-the-art
task-specific baselines. What is more, SCAR demonstrates effective knowledge
reuse in multi-task and transfer learning settings. To our knowledge, this work
is the first successful attempt to construct a general single-choice AVR solver
relying on self-configurable architecture and unified solving method. With this
work we aim to stimulate and foster progress on task-independent research paths
in the AVR domain, with the long-term goal of development of a general AVR
solver.
Related papers
- Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs [73.74375912785689]
This paper proposes unified training strategies for speech recognition systems.
We demonstrate that training a single model for all three tasks enhances VSR and AVSR performance.
We also introduce a greedy pseudo-labelling approach to more effectively leverage unlabelled samples.
arXiv Detail & Related papers (2024-11-04T16:46:53Z) - RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering [23.699493284403967]
This paper proposes RS-MoE, a first Mixture of Expert based VLM specifically customized for remote sensing domain.
Unlike traditional MoE models, the core of RS-MoE is the MoE Block, which incorporates a novel Instruction Router and multiple lightweight Large Language Models (LLMs) as expert models.
We show that our model achieves state-of-the-art performance in generating precise and contextually relevant captions.
arXiv Detail & Related papers (2024-11-03T15:05:49Z) - A Unified View of Abstract Visual Reasoning Problems [0.0]
We introduce a unified view of tasks, where each instance is rendered as a single image with no priori assumptions about the number of panels, their location, or role.
The main advantage of the proposed unified view is the ability to develop universal learning models applicable to various tasks.
Experiments conducted on four datasets with Raven's Progressive Matrices and Visual Analogy Problems show that the proposed unified representation of tasks poses a challenge to state-of-the-art Deep Learning (DL) models and, more broadly, contemporary DL image recognition methods.
arXiv Detail & Related papers (2024-06-16T20:52:44Z) - Single-Reset Divide & Conquer Imitation Learning [49.87201678501027]
Demonstrations are commonly used to speed up the learning process of Deep Reinforcement Learning algorithms.
Some algorithms have been developed to learn from a single demonstration.
arXiv Detail & Related papers (2024-02-14T17:59:47Z) - LAMBO: Large AI Model Empowered Edge Intelligence [71.56135386994119]
Next-generation edge intelligence is anticipated to benefit various applications via offloading techniques.
Traditional offloading architectures face several issues, including heterogeneous constraints, partial perception, uncertain generalization, and lack of tractability.
We propose a Large AI Model-Based Offloading (LAMBO) framework with over one billion parameters for solving these problems.
arXiv Detail & Related papers (2023-08-29T07:25:42Z) - The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple
Devices in Diverse Scenarios [61.74042680711718]
We introduce the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge.
This task comprises joint ASR and diarization in far-field settings with multiple, and possibly heterogeneous, recording devices.
The goal is for participants to devise a single system that can generalize across different array geometries.
arXiv Detail & Related papers (2023-06-23T18:49:20Z) - Assessor360: Multi-sequence Network for Blind Omnidirectional Image
Quality Assessment [50.82681686110528]
Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs)
The quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipeline lacks the modeling of the observer's browsing process.
We propose a novel multi-sequence network for BOIQA called Assessor360, which is derived from the realistic multi-assessor ODI quality assessment procedure.
arXiv Detail & Related papers (2023-05-18T13:55:28Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z) - A Review of Emerging Research Directions in Abstract Visual Reasoning [0.0]
We propose a taxonomy to categorise the tasks along 5 dimensions: input shapes, hidden rules, target task, cognitive function, and main challenge.
The perspective taken in this survey allows to characterise problems with respect to their shared and distinct properties, provides a unified view on the existing approaches for solving tasks.
One of them refers to the observation that in the machine learning literature different tasks are considered in isolation, which is in the stark contrast with the way the tasks are used to measure human intelligence.
arXiv Detail & Related papers (2022-02-21T14:58:02Z) - Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's
Progressive Matrices [0.0]
We focus on the most common type of tasks -- the Raven's Progressive Matrices ( RPMs) -- and provide a review of the learning methods and deep neural models applied to solve RPMs.
We conclude the paper by demonstrating how real-world problems can benefit from the discoveries of RPM studies.
arXiv Detail & Related papers (2022-01-28T19:24:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.