Exploit Clues from Views: Self-Supervised and Regularized Learning for
Multiview Object Recognition
- URL: http://arxiv.org/abs/2003.12735v1
- Date: Sat, 28 Mar 2020 07:06:06 GMT
- Title: Exploit Clues from Views: Self-Supervised and Regularized Learning for
Multiview Object Recognition
- Authors: Chih-Hui Ho, Bo Liu, Tz-Ying Wu, Nuno Vasconcelos
- Abstract summary: This work investigates the problem of multiview self-supervised learning (MV-SSL)
A novel surrogate task for self-supervised learning is proposed by pursuing "object invariant" representation.
Experiments shows that the recognition and retrieval results using view invariant prototype embedding (VISPE) outperform other self-supervised learning methods.
- Score: 66.87417785210772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiview recognition has been well studied in the literature and achieves
decent performance in object recognition and retrieval task. However, most
previous works rely on supervised learning and some impractical underlying
assumptions, such as the availability of all views in training and inference
time. In this work, the problem of multiview self-supervised learning (MV-SSL)
is investigated, where only image to object association is given. Given this
setup, a novel surrogate task for self-supervised learning is proposed by
pursuing "object invariant" representation. This is solved by randomly
selecting an image feature of an object as object prototype, accompanied with
multiview consistency regularization, which results in view invariant
stochastic prototype embedding (VISPE). Experiments shows that the recognition
and retrieval results using VISPE outperform that of other self-supervised
learning methods on seen and unseen data. VISPE can also be applied to
semi-supervised scenario and demonstrates robust performance with limited data
available. Code is available at https://github.com/chihhuiho/VISPE
Related papers
- Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Multi-View Class Incremental Learning [57.14644913531313]
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance.
This paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views.
arXiv Detail & Related papers (2023-06-16T08:13:41Z) - Learning Robust Visual-Semantic Embedding for Generalizable Person
Re-identification [11.562980171753162]
Generalizable person re-identification (Re-ID) is a very hot research topic in machine learning and computer vision.
Previous methods mainly focus on the visual representation learning, while neglect to explore the potential of semantic features during training.
We propose a Multi-Modal Equivalent Transformer called MMET for more robust visual-semantic embedding learning.
arXiv Detail & Related papers (2023-04-19T08:37:25Z) - Matching Multiple Perspectives for Efficient Representation Learning [0.0]
We present an approach that combines self-supervised learning with a multi-perspective matching technique.
We show that the availability of multiple views of the same object combined with a variety of self-supervised pretraining algorithms can lead to improved object classification performance.
arXiv Detail & Related papers (2022-08-16T10:33:13Z) - On the robustness of self-supervised representations for multi-view
object classification [0.0]
We show that self-supervised representations based on the instance discrimination objective lead to better representations of objects that are more robust to changes in the viewpoint and perspective of the object.
We find that self-supervised representations are more robust to object viewpoint and appear to encode more pertinent information about objects that facilitate the recognition of objects from novel views.
arXiv Detail & Related papers (2022-07-27T17:24:55Z) - UniVIP: A Unified Framework for Self-Supervised Visual Pre-training [50.87603616476038]
We propose a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset.
Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance.
Our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing.
arXiv Detail & Related papers (2022-03-14T10:04:04Z) - Multi-View representation learning in Multi-Task Scene [4.509968166110557]
We propose a novel semi-supervised algorithm, termed as Multi-Task Multi-View learning based on Common and Special Features (MTMVCSF)
An anti-noise multi-task multi-view algorithm called AN-MTMVCSF is proposed, which has a strong adaptability to noise labels.
The effectiveness of these algorithms is proved by a series of well-designed experiments on both real world and synthetic data.
arXiv Detail & Related papers (2022-01-15T11:26:28Z) - Multi-modal Transformers Excel at Class-agnostic Object Detection [105.10403103027306]
We argue that existing methods lack a top-down supervision signal governed by human-understandable semantics.
We develop an efficient and flexible MViT architecture using multi-scale feature processing and deformable self-attention.
We show the significance of MViT proposals in a diverse range of applications.
arXiv Detail & Related papers (2021-11-22T18:59:29Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - Unsupervised Pretraining for Object Detection by Patch Reidentification [72.75287435882798]
Unsupervised representation learning achieves promising performances in pre-training representations for object detectors.
This work proposes a simple yet effective representation learning method for object detection, named patch re-identification (Re-ID)
Our method significantly outperforms its counterparts on COCO in all settings, such as different training iterations and data percentages.
arXiv Detail & Related papers (2021-03-08T15:13:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.