Abstract: Novel view synthesis from a single image aims at generating novel views from
a single input image of an object. Several works recently achieved remarkable
results, though require some form of multi-view supervision at training time,
therefore limiting their deployment in real scenarios. This work aims at
relaxing this assumption enabling training of conditional generative model for
novel view synthesis in a completely unsupervised manner. We first pre-train a
purely generative decoder model using a GAN formulation while at the same time
training an encoder network to invert the mapping from latent code to images.
Then we swap encoder and decoder and train the network as a conditioned GAN
with a mixture of auto-encoder-like objective and self-distillation. At test
time, given a view of an object, our model first embeds the image content in a
latent code and regresses its pose w.r.t. a canonical reference system, then
generates novel views of it by keeping the code and varying the pose. We show
that our framework achieves results comparable to the state of the art on
ShapeNet and that it can be employed on unconstrained collections of natural
images, where no competing method can be trained.