From Patches to Objects: Exploiting Spatial Reasoning for Better Visual
Representations
- URL: http://arxiv.org/abs/2305.12384v1
- Date: Sun, 21 May 2023 07:46:46 GMT
- Title: From Patches to Objects: Exploiting Spatial Reasoning for Better Visual
Representations
- Authors: Toni Albert, Bjoern Eskofier, Dario Zanca
- Abstract summary: We propose a novel auxiliary pretraining method that is based on spatial reasoning.
Our proposed method takes advantage of a more flexible formulation of contrastive learning by introducing spatial reasoning as an auxiliary task for discriminative self-supervised methods.
- Score: 2.363388546004777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the field of deep learning steadily transitions from the realm of academic
research to practical application, the significance of self-supervised
pretraining methods has become increasingly prominent. These methods,
particularly in the image domain, offer a compelling strategy to effectively
utilize the abundance of unlabeled image data, thereby enhancing downstream
tasks' performance. In this paper, we propose a novel auxiliary pretraining
method that is based on spatial reasoning. Our proposed method takes advantage
of a more flexible formulation of contrastive learning by introducing spatial
reasoning as an auxiliary task for discriminative self-supervised methods.
Spatial Reasoning works by having the network predict the relative distances
between sampled non-overlapping patches. We argue that this forces the network
to learn more detailed and intricate internal representations of the objects
and the relationships between their constituting parts. Our experiments
demonstrate substantial improvement in downstream performance in linear
evaluation compared to similar work and provide directions for further research
into spatial reasoning.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Point Contrastive Prediction with Semantic Clustering for
Self-Supervised Learning on Point Cloud Videos [71.20376514273367]
We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data.
Our method outperforms supervised counterparts on a wide range of downstream tasks.
arXiv Detail & Related papers (2023-08-18T02:17:47Z) - Proto-Value Networks: Scaling Representation Learning with Auxiliary
Tasks [33.98624423578388]
Auxiliary tasks improve representations learned by deep reinforcement learning agents.
We derive a new family of auxiliary tasks based on the successor measure.
We show that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms.
arXiv Detail & Related papers (2023-04-25T04:25:08Z) - Unsupervised Interpretable Basis Extraction for Concept-Based Visual
Explanations [53.973055975918655]
We show that, intermediate layer representations become more interpretable when transformed to the bases extracted with our method.
We compare the bases extracted with our method with the bases derived with a supervised approach and find that, in one aspect, the proposed unsupervised approach has a strength that constitutes a limitation of the supervised one and give potential directions for future research.
arXiv Detail & Related papers (2023-03-19T00:37:19Z) - Towards Efficient and Effective Self-Supervised Learning of Visual
Representations [41.92884427579068]
Self-supervision has emerged as a propitious method for visual representation learning.
We propose to strengthen these methods using well-posed auxiliary tasks that converge significantly faster.
The proposed method utilizes the task of rotation prediction to improve the efficiency of existing state-of-the-art methods.
arXiv Detail & Related papers (2022-10-18T13:55:25Z) - Clustering augmented Self-Supervised Learning: Anapplication to Land
Cover Mapping [10.720852987343896]
We introduce a new method for land cover mapping by using a clustering based pretext task for self-supervised learning.
We demonstrate the effectiveness of the method on two societally relevant applications.
arXiv Detail & Related papers (2021-08-16T19:35:43Z) - Co$^2$L: Contrastive Continual Learning [69.46643497220586]
Recent breakthroughs in self-supervised learning show that such algorithms learn visual representations that can be transferred better to unseen tasks.
We propose a rehearsal-based continual learning algorithm that focuses on continually learning and maintaining transferable representations.
arXiv Detail & Related papers (2021-06-28T06:14:38Z) - More than just an auxiliary loss: Anti-spoofing Backbone Training via
Adversarial Pseudo-depth Generation [4.542003078412816]
A new method of training pipeline is discussed to achieve significant performance on the task of anti-spoofing with RGB image.
Our method approaches the baseline performance of the current state of the art anti-spoofing models with 15.8x less parameters used.
arXiv Detail & Related papers (2021-01-01T09:00:17Z) - Heterogeneous Contrastive Learning: Encoding Spatial Information for
Compact Visual Representations [183.03278932562438]
This paper presents an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations.
We show that our approach achieves higher efficiency in visual representations and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
arXiv Detail & Related papers (2020-11-19T16:26:25Z) - Learning Representations that Support Extrapolation [39.84463809100903]
We consider the challenge of learning representations that support extrapolation.
We introduce a novel visual analogy benchmark that allows the graded evaluation of extrapolation.
We also introduce a simple technique, temporal context normalization, that encourages representations that emphasize the relations between objects.
arXiv Detail & Related papers (2020-07-09T20:53:45Z) - Learning Invariant Representations for Reinforcement Learning without
Reconstruction [98.33235415273562]
We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction.
Bisimulation metrics quantify behavioral similarity between states in continuous MDPs.
We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks.
arXiv Detail & Related papers (2020-06-18T17:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.