Self-Supervised Viewpoint Learning From Image Collections
- URL: http://arxiv.org/abs/2004.01793v1
- Date: Fri, 3 Apr 2020 22:01:41 GMT
- Title: Self-Supervised Viewpoint Learning From Image Collections
- Authors: Siva Karthik Mustikovela, Varun Jampani, Shalini De Mello, Sifei Liu,
Umar Iqbal, Carsten Rother, Jan Kautz
- Abstract summary: We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner.
We show that our approach performs competitively to fully-supervised approaches for several object categories like human faces, cars, buses, and trains.
- Score: 116.56304441362994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training deep neural networks to estimate the viewpoint of objects requires
large labeled training datasets. However, manually labeling viewpoints is
notoriously hard, error-prone, and time-consuming. On the other hand, it is
relatively easy to mine many unlabelled images of an object category from the
internet, e.g., of cars or faces. We seek to answer the research question of
whether such unlabeled collections of in-the-wild images can be successfully
utilized to train viewpoint estimation networks for general object categories
purely via self-supervision. Self-supervision here refers to the fact that the
only true supervisory signal that the network has is the input image itself. We
propose a novel learning framework which incorporates an analysis-by-synthesis
paradigm to reconstruct images in a viewpoint aware manner with a generative
network, along with symmetry and adversarial constraints to successfully
supervise our viewpoint estimation network. We show that our approach performs
competitively to fully-supervised approaches for several object categories like
human faces, cars, buses, and trains. Our work opens up further research in
self-supervised viewpoint learning and serves as a robust baseline for it. We
open-source our code at https://github.com/NVlabs/SSV.
Related papers
- Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - ViewNet: Unsupervised Viewpoint Estimation from Conditional Generation [35.89557494372891]
We formulate this as a self-supervised learning task, where image reconstruction provides the supervision needed to predict the camera viewpoint.
We demonstrate that using a perspective spatial transformer allows efficient viewpoint learning, outperforming existing unsupervised approaches on synthetic data.
arXiv Detail & Related papers (2022-12-01T11:16:04Z) - Semantic-Aware Generation for Self-Supervised Visual Representation
Learning [116.5814634936371]
We advocate for Semantic-aware Generation (SaGe) to facilitate richer semantics rather than details to be preserved in the generated image.
SaGe complements the target network with view-specific features and thus alleviates the semantic degradation brought by intensive data augmentations.
We execute SaGe on ImageNet-1K and evaluate the pre-trained models on five downstream tasks including nearest neighbor test, linear classification, and fine-scaled image recognition.
arXiv Detail & Related papers (2021-11-25T16:46:13Z) - Unsupervised Object-Level Representation Learning from Scene Images [97.07686358706397]
Object-level Representation Learning (ORL) is a new self-supervised learning framework towards scene images.
Our key insight is to leverage image-level self-supervised pre-training as the prior to discover object-level semantic correspondence.
ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks.
arXiv Detail & Related papers (2021-06-22T17:51:24Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z) - Unsupervised Image Classification for Deep Representation Learning [42.09716669386924]
We propose an unsupervised image classification framework without using embedding clustering.
Experiments on ImageNet dataset have been conducted to prove the effectiveness of our method.
arXiv Detail & Related papers (2020-06-20T02:57:06Z) - VirTex: Learning Visual Representations from Textual Annotations [25.104705278771895]
VirTex is a pretraining approach using semantically dense captions to learn visual representations.
We train convolutional networks from scratch on COCO Captions, and transfer them to downstream recognition tasks.
On all tasks, VirTex yields features that match or exceed those learned on ImageNet -- supervised or unsupervised.
arXiv Detail & Related papers (2020-06-11T17:58:48Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.