One-shot Visual Reasoning on RPMs with an Application to Video Frame
Prediction
- URL: http://arxiv.org/abs/2111.12301v1
- Date: Wed, 24 Nov 2021 06:51:38 GMT
- Title: One-shot Visual Reasoning on RPMs with an Application to Video Frame
Prediction
- Authors: Wentao He, Jianfeng Ren, Ruibin Bai
- Abstract summary: Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability.
We propose a One-shot Human-Understandable ReaSoner (Os-HURS) to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks.
- Score: 1.0932251830449902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's
visual reasoning ability. Researchers have made considerable effort in
developing a system which could automatically solve the RPM problem, often
through a black-box end-to-end Convolutional Neural Network (CNN) for both
visual recognition and logical reasoning tasks. Towards the objective of
developing a highly explainable solution, we propose a One-shot
Human-Understandable ReaSoner (Os-HURS), which is a two-step framework
including a perception module and a reasoning module, to tackle the challenges
of real-world visual recognition and subsequent logical reasoning tasks,
respectively. For the reasoning module, we propose a "2+1" formulation that can
be better understood by humans and significantly reduces the model complexity.
As a result, a precise reasoning rule can be deduced from one RPM sample only,
which is not feasible for existing solution methods. The proposed reasoning
module is also capable of yielding a set of reasoning rules, precisely modeling
the human knowledge in solving the RPM problem. To validate the proposed method
on real-world applications, an RPM-like One-shot Frame-prediction (ROF) dataset
is constructed, where visual reasoning is conducted on RPMs constructed using
real-world video frames instead of synthetic images. Experimental results on
various RPM-like datasets demonstrate that the proposed Os-HURS achieves a
significant and consistent performance gain compared with the state-of-the-art
models.
Related papers
- Towards Generative Abstract Reasoning: Completing Raven's Progressive Matrix via Rule Abstraction and Selection [52.107043437362556]
Raven's Progressive Matrix (RPM) is widely used to probe abstract visual reasoning in machine intelligence.
Participators of RPM tests can show powerful reasoning ability by inferring and combining attribute-changing rules.
We propose a deep latent variable model for answer generation problems through Rule AbstractIon and SElection.
arXiv Detail & Related papers (2024-01-18T13:28:44Z) - Faster Video Moment Retrieval with Point-Level Supervision [70.51822333023145]
Video Moment Retrieval (VMR) aims at retrieving the most relevant events from an untrimmed video with natural language queries.
Existing VMR methods suffer from two defects: massive expensive temporal annotations and complicated cross-modal interaction modules.
We propose a novel method termed Cheaper and Faster Moment Retrieval (CFMR)
arXiv Detail & Related papers (2023-05-23T12:53:50Z) - Can SAM Boost Video Super-Resolution? [78.29033914169025]
We propose a simple yet effective module -- SAM-guidEd refinEment Module (SEEM)
This light-weight plug-in module is specifically designed to leverage the attention mechanism for the generation of semantic-aware feature.
We apply our SEEM to two representative methods, EDVR and BasicVSR, resulting in consistently improved performance with minimal implementation effort.
arXiv Detail & Related papers (2023-05-11T02:02:53Z) - Learning to reason over visual objects [6.835410768769661]
We investigate the extent to which a general-purpose mechanism for processing visual scenes in terms of objects might help promote abstract visual reasoning.
We find that an inductive bias for object-centric processing may be a key component of abstract visual reasoning.
arXiv Detail & Related papers (2023-03-03T23:19:42Z) - DAReN: A Collaborative Approach Towards Reasoning And Disentangling [27.50150027974947]
We propose an end-to-end joint representation-reasoning learning framework, which leverages a weak form of inductive bias to improve both tasks together.
We accomplish this using a novel learning framework Disentangling based Abstract Reasoning Network (DAReN) based on the principles of GM-RPM.
arXiv Detail & Related papers (2021-09-27T16:10:30Z) - Unsupervised Abstract Reasoning for Raven's Problem Matrices [9.278113063631643]
Raven's Progressive Matrices ( RPM) is highly correlated with human intelligence.
We propose the first unsupervised learning method for solving RPM problems.
Our method even outperforms some of the supervised approaches.
arXiv Detail & Related papers (2021-09-21T07:44:58Z) - A Data Augmentation Method by Mixing Up Negative Candidate Answers for
Solving Raven's Progressive Matrices [0.829949723558878]
Raven's Progressive Matrices ( RPMs) are frequently-used in testing human's visual reasoning ability.
Recent developed RPM-like datasets and solution models transfer this kind of problems from cognitive science to computer science.
We propose a data augmentation strategy by image mix-up, which is generalizable to a variety of multiple-choice problems.
arXiv Detail & Related papers (2021-03-09T04:50:32Z) - Multi-Label Contrastive Learning for Abstract Visual Reasoning [0.0]
State-of-the-art systems solving Raven's Progressive Matrices rely on massive pattern-based training and exploiting biases in the dataset.
Humans concentrate on identification of the rules / concepts underlying the RPM (or generally a visual reasoning task) to be solved.
We propose a new sparse rule encoding scheme for RPMs which, besides the new training algorithm, is the key factor contributing to the state-of-the-art performance.
arXiv Detail & Related papers (2020-12-03T14:18:15Z) - DynaVSR: Dynamic Adaptive Blind Video Super-Resolution [60.154204107453914]
DynaVSR is a novel meta-learning-based framework for real-world video SR.
We train a multi-frame downscaling module with various types of synthetic blur kernels, which is seamlessly combined with a video SR network for input-aware adaptation.
Experimental results show that DynaVSR consistently improves the performance of the state-of-the-art video SR models by a large margin.
arXiv Detail & Related papers (2020-11-09T15:07:32Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z) - Video Face Super-Resolution with Motion-Adaptive Feedback Cell [90.73821618795512]
Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN)
In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way.
arXiv Detail & Related papers (2020-02-15T13:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.