One-shot Visual Reasoning on RPMs with an Application to Video Frame
Prediction
- URL: http://arxiv.org/abs/2111.12301v1
- Date: Wed, 24 Nov 2021 06:51:38 GMT
- Title: One-shot Visual Reasoning on RPMs with an Application to Video Frame
Prediction
- Authors: Wentao He, Jianfeng Ren, Ruibin Bai
- Abstract summary: Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability.
We propose a One-shot Human-Understandable ReaSoner (Os-HURS) to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks.
- Score: 1.0932251830449902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's
visual reasoning ability. Researchers have made considerable effort in
developing a system which could automatically solve the RPM problem, often
through a black-box end-to-end Convolutional Neural Network (CNN) for both
visual recognition and logical reasoning tasks. Towards the objective of
developing a highly explainable solution, we propose a One-shot
Human-Understandable ReaSoner (Os-HURS), which is a two-step framework
including a perception module and a reasoning module, to tackle the challenges
of real-world visual recognition and subsequent logical reasoning tasks,
respectively. For the reasoning module, we propose a "2+1" formulation that can
be better understood by humans and significantly reduces the model complexity.
As a result, a precise reasoning rule can be deduced from one RPM sample only,
which is not feasible for existing solution methods. The proposed reasoning
module is also capable of yielding a set of reasoning rules, precisely modeling
the human knowledge in solving the RPM problem. To validate the proposed method
on real-world applications, an RPM-like One-shot Frame-prediction (ROF) dataset
is constructed, where visual reasoning is conducted on RPMs constructed using
real-world video frames instead of synthetic images. Experimental results on
various RPM-like datasets demonstrate that the proposed Os-HURS achieves a
significant and consistent performance gain compared with the state-of-the-art
models.
Related papers
- DMWM: Dual-Mind World Model with Long-Term Imagination [53.98633183204453]
We propose a novel dual-mind world model (DMWM) framework that integrates logical reasoning to enable imagination with logical consistency.
The proposed framework is evaluated on benchmark tasks that require long-term planning from the DMControl suite.
arXiv Detail & Related papers (2025-02-11T14:40:57Z) - BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.
We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.
We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z) - Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning [74.90592233107712]
We propose a Direct-Indirect Reasoning (DIR) method, which considers Direct Reasoning (DR) and Indirect Reasoning (IR) as multiple parallel reasoning paths that are merged to derive the final answer.
Our DIR method is simple yet effective and can be straightforwardly integrated with existing variants of CoT methods.
arXiv Detail & Related papers (2024-02-06T03:41:12Z) - Faster Video Moment Retrieval with Point-Level Supervision [70.51822333023145]
Video Moment Retrieval (VMR) aims at retrieving the most relevant events from an untrimmed video with natural language queries.
Existing VMR methods suffer from two defects: massive expensive temporal annotations and complicated cross-modal interaction modules.
We propose a novel method termed Cheaper and Faster Moment Retrieval (CFMR)
arXiv Detail & Related papers (2023-05-23T12:53:50Z) - Learning to reason over visual objects [6.835410768769661]
We investigate the extent to which a general-purpose mechanism for processing visual scenes in terms of objects might help promote abstract visual reasoning.
We find that an inductive bias for object-centric processing may be a key component of abstract visual reasoning.
arXiv Detail & Related papers (2023-03-03T23:19:42Z) - DAReN: A Collaborative Approach Towards Reasoning And Disentangling [27.50150027974947]
We propose an end-to-end joint representation-reasoning learning framework, which leverages a weak form of inductive bias to improve both tasks together.
We accomplish this using a novel learning framework Disentangling based Abstract Reasoning Network (DAReN) based on the principles of GM-RPM.
arXiv Detail & Related papers (2021-09-27T16:10:30Z) - A Data Augmentation Method by Mixing Up Negative Candidate Answers for
Solving Raven's Progressive Matrices [0.829949723558878]
Raven's Progressive Matrices ( RPMs) are frequently-used in testing human's visual reasoning ability.
Recent developed RPM-like datasets and solution models transfer this kind of problems from cognitive science to computer science.
We propose a data augmentation strategy by image mix-up, which is generalizable to a variety of multiple-choice problems.
arXiv Detail & Related papers (2021-03-09T04:50:32Z) - Multi-Label Contrastive Learning for Abstract Visual Reasoning [0.0]
State-of-the-art systems solving Raven's Progressive Matrices rely on massive pattern-based training and exploiting biases in the dataset.
Humans concentrate on identification of the rules / concepts underlying the RPM (or generally a visual reasoning task) to be solved.
We propose a new sparse rule encoding scheme for RPMs which, besides the new training algorithm, is the key factor contributing to the state-of-the-art performance.
arXiv Detail & Related papers (2020-12-03T14:18:15Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z) - Video Face Super-Resolution with Motion-Adaptive Feedback Cell [90.73821618795512]
Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN)
In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way.
arXiv Detail & Related papers (2020-02-15T13:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.