Scale-Localized Abstract Reasoning
- URL: http://arxiv.org/abs/2009.09405v2
- Date: Mon, 26 Jul 2021 20:11:10 GMT
- Title: Scale-Localized Abstract Reasoning
- Authors: Yaniv Benny, Niv Pekar, and Lior Wolf
- Abstract summary: We consider the abstract relational reasoning task, which is commonly used as an intelligence test.
Since some patterns have spatial rationales, while others are only semantic, we propose a multi-scale architecture that processes each query in multiple resolutions.
We show that indeed different rules are solved by different resolutions and a combined multi-scale approach outperforms the existing state of the art in this task on all benchmarks by 5-54%.
- Score: 79.00011351374869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the abstract relational reasoning task, which is commonly used as
an intelligence test. Since some patterns have spatial rationales, while others
are only semantic, we propose a multi-scale architecture that processes each
query in multiple resolutions. We show that indeed different rules are solved
by different resolutions and a combined multi-scale approach outperforms the
existing state of the art in this task on all benchmarks by 5-54%. The success
of our method is shown to arise from multiple novelties. First, it searches for
relational patterns in multiple resolutions, which allows it to readily detect
visual relations, such as location, in higher resolution, while allowing the
lower resolution module to focus on semantic relations, such as shape type.
Second, we optimize the reasoning network of each resolution proportionally to
its performance, hereby we motivate each resolution to specialize on the rules
for which it performs better than the others and ignore cases that are already
solved by the other resolutions. Third, we propose a new way to pool
information along the rows and the columns of the illustration-grid of the
query. Our work also analyses the existing benchmarks, demonstrating that the
RAVEN dataset selects the negative examples in a way that is easily exploited.
We, therefore, propose a modified version of the RAVEN dataset, named
RAVEN-FAIR. Our code and pretrained models are available at
https://github.com/yanivbenny/MRNet.
Related papers
- Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - FlexER: Flexible Entity Resolution for Multiple Intents [0.0]
We introduce the problem of multiple intents entity resolution (MIER), an extension to the universal (single intent) entity resolution task.
We propose FlexER, utilizing contemporary solutions to universal entity resolution tasks to solve multiple intents entity resolution.
A large-scale empirical evaluation introduces a new benchmark and, using also two well-known benchmarks, shows that FlexER effectively solves the MIER problem and outperforms the state-of-the-art for a universal entity resolution.
arXiv Detail & Related papers (2022-08-23T15:52:52Z) - Learning Resolution-Adaptive Representations for Cross-Resolution Person
Re-Identification [49.57112924976762]
Cross-resolution person re-identification problem aims to match low-resolution (LR) query identity images against high resolution (HR) gallery images.
It is a challenging and practical problem since the query images often suffer from resolution degradation due to the different capturing conditions from real-world cameras.
This paper explores an alternative SR-free paradigm to directly compare HR and LR images via a dynamic metric, which is adaptive to the resolution of a query image.
arXiv Detail & Related papers (2022-07-09T03:49:51Z) - Resolution based Feature Distillation for Cross Resolution Person
Re-Identification [17.86505685442293]
Person re-identification (re-id) aims to retrieve images of same identities across different camera views.
Resolution mismatch occurs due to varying distances between person of interest and cameras.
We propose a Resolution based Feature Distillation (RFD) approach to overcome the problem of multiple resolutions.
arXiv Detail & Related papers (2021-09-16T11:07:59Z) - Unsupervised and self-adaptative techniques for cross-domain person
re-identification [82.54691433502335]
Person Re-Identification (ReID) across non-overlapping cameras is a challenging task.
Unsupervised Domain Adaptation (UDA) is a promising alternative, as it performs feature-learning adaptation from a model trained on a source to a target domain without identity-label annotation.
In this paper, we propose a novel UDA-based ReID method that takes advantage of triplets of samples created by a new offline strategy.
arXiv Detail & Related papers (2021-03-21T23:58:39Z) - MDMMT: Multidomain Multimodal Transformer for Video Retrieval [63.872634680339644]
We present a new state-of-the-art on the text to video retrieval task on MSRVTT and LSMDC benchmarks.
We show that training on different datasets can improve test results of each other.
arXiv Detail & Related papers (2021-03-19T09:16:39Z) - The Little W-Net That Could: State-of-the-Art Retinal Vessel
Segmentation with Minimalistic Models [19.089445797922316]
We show that a minimalistic version of a standard U-Net with several orders of magnitude less parameters closely approximates the performance of current best techniques.
We also propose a simple extension, dubbed W-Net, which reaches outstanding performance on several popular datasets.
We also test our approach on the Artery/Vein segmentation problem, where we again achieve results well-aligned with the state-of-the-art.
arXiv Detail & Related papers (2020-09-03T19:59:51Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.