ViewNet: Unsupervised Viewpoint Estimation from Conditional Generation
- URL: http://arxiv.org/abs/2212.00435v1
- Date: Thu, 1 Dec 2022 11:16:04 GMT
- Title: ViewNet: Unsupervised Viewpoint Estimation from Conditional Generation
- Authors: Octave Mariotti, Oisin Mac Aodha and Hakan Bilen
- Abstract summary: We formulate this as a self-supervised learning task, where image reconstruction provides the supervision needed to predict the camera viewpoint.
We demonstrate that using a perspective spatial transformer allows efficient viewpoint learning, outperforming existing unsupervised approaches on synthetic data.
- Score: 35.89557494372891
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Understanding the 3D world without supervision is currently a major challenge
in computer vision as the annotations required to supervise deep networks for
tasks in this domain are expensive to obtain on a large scale. In this paper,
we address the problem of unsupervised viewpoint estimation. We formulate this
as a self-supervised learning task, where image reconstruction provides the
supervision needed to predict the camera viewpoint. Specifically, we make use
of pairs of images of the same object at training time, from unknown
viewpoints, to self-supervise training by combining the viewpoint information
from one image with the appearance information from the other. We demonstrate
that using a perspective spatial transformer allows efficient viewpoint
learning, outperforming existing unsupervised approaches on synthetic data, and
obtains competitive results on the challenging PASCAL3D+ dataset.
Related papers
- PEEKABOO: Hiding parts of an image for unsupervised object localization [7.161489957025654]
Localizing objects in an unsupervised manner poses significant challenges due to the absence of key visual information.
We propose a single-stage learning framework, dubbed PEEKABOO, for unsupervised object localization.
The key idea is to selectively hide parts of an image and leverage the remaining image information to infer the location of objects without explicit supervision.
arXiv Detail & Related papers (2024-07-24T20:35:20Z) - Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation [26.630702699374194]
We propose a unified framework that leverages mask as supervision for unsupervised 3D pose estimation.
We organize the human skeleton in a fully unsupervised way which enables the processing of annotation-free data.
Experiments demonstrate our state-of-the-art pose estimation performance on Human3.6M and MPI-INF-3DHP datasets.
arXiv Detail & Related papers (2023-12-12T08:08:34Z) - Stereo Matching by Self-supervision of Multiscopic Vision [65.38359887232025]
We propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera positions.
A cross photometric loss, an uncertainty-aware mutual-supervision loss, and a new smoothness loss are introduced to optimize the network.
Our model obtains better disparity maps than previous unsupervised methods on the KITTI dataset.
arXiv Detail & Related papers (2021-04-09T02:58:59Z) - From Image Collections to Point Clouds with Self-supervised Shape and
Pose Networks [53.71440550507745]
Reconstructing 3D models from 2D images is one of the fundamental problems in computer vision.
We propose a deep learning technique for 3D object reconstruction from a single image.
We learn both 3D point cloud reconstruction and pose estimation networks in a self-supervised manner.
arXiv Detail & Related papers (2020-05-05T04:25:16Z) - Self-Supervised Viewpoint Learning From Image Collections [116.56304441362994]
We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner.
We show that our approach performs competitively to fully-supervised approaches for several object categories like human faces, cars, buses, and trains.
arXiv Detail & Related papers (2020-04-03T22:01:41Z) - DeepCap: Monocular Human Performance Capture Using Weak Supervision [106.50649929342576]
We propose a novel deep learning approach for monocular dense human performance capture.
Our method is trained in a weakly supervised manner based on multi-view supervision.
Our approach outperforms the state of the art in terms of quality and robustness.
arXiv Detail & Related papers (2020-03-18T16:39:56Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.