Related papers: One-shot Visual Reasoning on RPMs with an Application to Video Frame Prediction

One-shot Visual Reasoning on RPMs with an Application to Video Frame Prediction

URL: http://arxiv.org/abs/2111.12301v1
Date: Wed, 24 Nov 2021 06:51:38 GMT
Title: One-shot Visual Reasoning on RPMs with an Application to Video Frame Prediction
Authors: Wentao He, Jianfeng Ren, Ruibin Bai
Abstract summary: Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability. We propose a One-shot Human-Understandable ReaSoner (Os-HURS) to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks.
Score: 1.0932251830449902
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability. Researchers have made considerable effort in developing a system which could automatically solve the RPM problem, often through a black-box end-to-end Convolutional Neural Network (CNN) for both visual recognition and logical reasoning tasks. Towards the objective of developing a highly explainable solution, we propose a One-shot Human-Understandable ReaSoner (Os-HURS), which is a two-step framework including a perception module and a reasoning module, to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks, respectively. For the reasoning module, we propose a "2+1" formulation that can be better understood by humans and significantly reduces the model complexity. As a result, a precise reasoning rule can be deduced from one RPM sample only, which is not feasible for existing solution methods. The proposed reasoning module is also capable of yielding a set of reasoning rules, precisely modeling the human knowledge in solving the RPM problem. To validate the proposed method on real-world applications, an RPM-like One-shot Frame-prediction (ROF) dataset is constructed, where visual reasoning is conducted on RPMs constructed using real-world video frames instead of synthetic images. Experimental results on various RPM-like datasets demonstrate that the proposed Os-HURS achieves a significant and consistent performance gain compared with the state-of-the-art models.

Related papers

Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning [58.86928947970342]
Embodied-R is a framework combining large-scale Vision-Language Models for perception and small-scale Language Models for reasoning. After training on only 5k embodied video samples, Embodied-R with a 3B LM matches state-of-the-art multimodal reasoning models. Embodied-R also exhibits emergent thinking patterns such as systematic analysis and contextual integration.
arXiv Detail & Related papers (2025-04-17T06:16:11Z)
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving [90.88021670297664]
FINEREASON is a logic-puzzle benchmark for evaluation of large language models' reasoning capabilities. We introduce two tasks: state checking, and state transition, for a comprehensive evaluation of how models assess the current situation and plan the next move. We show that models trained on our state checking and transition data demonstrate gains in math reasoning by up to 5.1% on GSM8K.
arXiv Detail & Related papers (2025-02-27T16:23:25Z)
DMWM: Dual-Mind World Model with Long-Term Imagination [53.98633183204453]
We propose a novel dual-mind world model (DMWM) framework that integrates logical reasoning to enable imagination with logical consistency. The proposed framework is evaluated on benchmark tasks that require long-term planning from the DMControl suite.
arXiv Detail & Related papers (2025-02-11T14:40:57Z)
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning [74.90592233107712]
We propose a Direct-Indirect Reasoning (DIR) method, which considers Direct Reasoning (DR) and Indirect Reasoning (IR) as multiple parallel reasoning paths that are merged to derive the final answer. Our DIR method is simple yet effective and can be straightforwardly integrated with existing variants of CoT methods.
arXiv Detail & Related papers (2024-02-06T03:41:12Z)
Towards Generative Abstract Reasoning: Completing Raven's Progressive Matrix via Rule Abstraction and Selection [52.107043437362556]
Raven's Progressive Matrix (RPM) is widely used to probe abstract visual reasoning in machine intelligence. Participators of RPM tests can show powerful reasoning ability by inferring and combining attribute-changing rules. We propose a deep latent variable model for answer generation problems through Rule AbstractIon and SElection.
arXiv Detail & Related papers (2024-01-18T13:28:44Z)
Faster Video Moment Retrieval with Point-Level Supervision [70.51822333023145]
Video Moment Retrieval (VMR) aims at retrieving the most relevant events from an untrimmed video with natural language queries. Existing VMR methods suffer from two defects: massive expensive temporal annotations and complicated cross-modal interaction modules. We propose a novel method termed Cheaper and Faster Moment Retrieval (CFMR)
arXiv Detail & Related papers (2023-05-23T12:53:50Z)
Can SAM Boost Video Super-Resolution? [78.29033914169025]
We propose a simple yet effective module -- SAM-guidEd refinEment Module (SEEM) This light-weight plug-in module is specifically designed to leverage the attention mechanism for the generation of semantic-aware feature. We apply our SEEM to two representative methods, EDVR and BasicVSR, resulting in consistently improved performance with minimal implementation effort.
arXiv Detail & Related papers (2023-05-11T02:02:53Z)
Learning to reason over visual objects [6.835410768769661]
We investigate the extent to which a general-purpose mechanism for processing visual scenes in terms of objects might help promote abstract visual reasoning. We find that an inductive bias for object-centric processing may be a key component of abstract visual reasoning.
arXiv Detail & Related papers (2023-03-03T23:19:42Z)
DAReN: A Collaborative Approach Towards Reasoning And Disentangling [27.50150027974947]
We propose an end-to-end joint representation-reasoning learning framework, which leverages a weak form of inductive bias to improve both tasks together. We accomplish this using a novel learning framework Disentangling based Abstract Reasoning Network (DAReN) based on the principles of GM-RPM.
arXiv Detail & Related papers (2021-09-27T16:10:30Z)
Unsupervised Abstract Reasoning for Raven's Problem Matrices [9.278113063631643]
Raven's Progressive Matrices ( RPM) is highly correlated with human intelligence. We propose the first unsupervised learning method for solving RPM problems. Our method even outperforms some of the supervised approaches.
arXiv Detail & Related papers (2021-09-21T07:44:58Z)
A Data Augmentation Method by Mixing Up Negative Candidate Answers for Solving Raven's Progressive Matrices [0.829949723558878]
Raven's Progressive Matrices ( RPMs) are frequently-used in testing human's visual reasoning ability. Recent developed RPM-like datasets and solution models transfer this kind of problems from cognitive science to computer science. We propose a data augmentation strategy by image mix-up, which is generalizable to a variety of multiple-choice problems.
arXiv Detail & Related papers (2021-03-09T04:50:32Z)
Multi-Label Contrastive Learning for Abstract Visual Reasoning [0.0]
State-of-the-art systems solving Raven's Progressive Matrices rely on massive pattern-based training and exploiting biases in the dataset. Humans concentrate on identification of the rules / concepts underlying the RPM (or generally a visual reasoning task) to be solved. We propose a new sparse rule encoding scheme for RPMs which, besides the new training algorithm, is the key factor contributing to the state-of-the-art performance.
arXiv Detail & Related papers (2020-12-03T14:18:15Z)
DynaVSR: Dynamic Adaptive Blind Video Super-Resolution [60.154204107453914]
DynaVSR is a novel meta-learning-based framework for real-world video SR. We train a multi-frame downscaling module with various types of synthetic blur kernels, which is seamlessly combined with a video SR network for input-aware adaptation. Experimental results show that DynaVSR consistently improves the performance of the state-of-the-art video SR models by a large margin.
arXiv Detail & Related papers (2020-11-09T15:07:32Z)
MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame. Inter- and intra-frames are the key sources for exploiting temporal and spatial information. We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z)
Video Face Super-Resolution with Motion-Adaptive Feedback Cell [90.73821618795512]
Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN) In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way.
arXiv Detail & Related papers (2020-02-15T13:14:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.