Unsupervised Learning of Depth and Depth-of-Field Effect from Natural
Images with Aperture Rendering Generative Adversarial Networks
- URL: http://arxiv.org/abs/2106.13041v1
- Date: Thu, 24 Jun 2021 14:15:50 GMT
- Title: Unsupervised Learning of Depth and Depth-of-Field Effect from Natural
Images with Aperture Rendering Generative Adversarial Networks
- Authors: Takuhiro Kaneko
- Abstract summary: We propose aperture rendering generative adversarial networks (AR-GANs), which equip aperture rendering on top of GANs, and adopt focus cues to learn the depth and depth-of-field effect of unlabeled natural images.
In the experiments, we demonstrate the effectiveness of AR-GANs in various datasets, such as flower, bird, and face images, demonstrate their portability by incorporating them into other 3D representation learning GANs, and validate their applicability in shallow DoF rendering.
- Score: 15.546533383799309
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the 3D world from 2D projected natural images is a fundamental
challenge in computer vision and graphics. Recently, an unsupervised learning
approach has garnered considerable attention owing to its advantages in data
collection. However, to mitigate training limitations, typical methods need to
impose assumptions for viewpoint distribution (e.g., a dataset containing
various viewpoint images) or object shape (e.g., symmetric objects). These
assumptions often restrict applications; for instance, the application to
non-rigid objects or images captured from similar viewpoints (e.g., flower or
bird images) remains a challenge. To complement these approaches, we propose
aperture rendering generative adversarial networks (AR-GANs), which equip
aperture rendering on top of GANs, and adopt focus cues to learn the depth and
depth-of-field (DoF) effect of unlabeled natural images. To address the
ambiguities triggered by unsupervised setting (i.e., ambiguities between smooth
texture and out-of-focus blurs, and between foreground and background blurs),
we develop DoF mixture learning, which enables the generator to learn real
image distribution while generating diverse DoF images. In addition, we devise
a center focus prior to guiding the learning direction. In the experiments, we
demonstrate the effectiveness of AR-GANs in various datasets, such as flower,
bird, and face images, demonstrate their portability by incorporating them into
other 3D representation learning GANs, and validate their applicability in
shallow DoF rendering.
Related papers
- Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection.
CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z) - DepGAN: Leveraging Depth Maps for Handling Occlusions and Transparency in Image Composition [7.693732944239458]
DepGAN is a Generative Adversarial Network that utilizes depth maps and alpha channels to rectify inaccurate occlusions.
Central to our network is a novel loss function called Depth Aware Loss which quantifies the pixel wise depth difference.
We enhance our network's learning process by utilizing opacity data, enabling it to effectively manage compositions involving transparent and semi-transparent objects.
arXiv Detail & Related papers (2024-07-16T16:18:40Z) - Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models [55.99654128127689]
Visual Foundation Models (VFMs) are used to enhance 3D representation learning.
VFMs generate semantic labels for weakly-supervised pixel-to-point contrastive distillation.
We adapt sampling probabilities of points to address imbalances in spatial distribution and category frequency.
arXiv Detail & Related papers (2024-05-23T07:48:19Z) - Learning 3D-Aware GANs from Unposed Images with Template Feature Field [33.32761749864555]
This work targets learning 3D-aware GANs from unposed images.
We propose to perform on-the-fly pose estimation of training images with a learned template feature field (TeFF)
arXiv Detail & Related papers (2024-04-08T17:42:08Z) - HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation [106.09886920774002]
We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network.
Our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes.
arXiv Detail & Related papers (2024-03-18T14:18:08Z) - GAN2X: Non-Lambertian Inverse Rendering of Image GANs [85.76426471872855]
We present GAN2X, a new method for unsupervised inverse rendering that only uses unpaired images for training.
Unlike previous Shape-from-GAN approaches that mainly focus on 3D shapes, we take the first attempt to also recover non-Lambertian material properties by exploiting the pseudo paired data generated by a GAN.
Experiments demonstrate that GAN2X can accurately decompose 2D images to 3D shape, albedo, and specular properties for different object categories, and achieves the state-of-the-art performance for unsupervised single-view 3D face reconstruction.
arXiv Detail & Related papers (2022-06-18T16:58:49Z) - AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural
Images with Aperture Rendering Neural Radiance Fields [23.92262483956057]
Fully unsupervised 3D representation learning has gained attention owing to its advantages in data collection.
We propose an aperture rendering NeRF (AR-NeRF) which can utilize viewpoint and defocus cues in a unified manner.
We demonstrate the utility of AR-NeRF for unsupervised learning of the depth and defocus effects.
arXiv Detail & Related papers (2022-06-13T12:41:59Z) - De-rendering 3D Objects in the Wild [21.16153549406485]
We present a weakly supervised method that is able to decompose a single image of an object into shape.
For training, the method only relies on a rough initial shape estimate of the training objects to bootstrap the learning process.
In our experiments, we show that the method can successfully de-render 2D images into a 3D representation and generalizes to unseen object categories.
arXiv Detail & Related papers (2022-01-06T23:50:09Z) - Combining Semantic Guidance and Deep Reinforcement Learning For
Generating Human Level Paintings [22.889059874754242]
Generation of stroke-based non-photorealistic imagery is an important problem in the computer vision community.
Previous methods have been limited to datasets with little variation in position, scale and saliency of the foreground object.
We propose a Semantic Guidance pipeline with 1) a bi-level painting procedure for learning the distinction between foreground and background brush strokes at training time.
arXiv Detail & Related papers (2020-11-25T09:00:04Z) - Image GANs meet Differentiable Rendering for Inverse Graphics and
Interpretable 3D Neural Rendering [101.56891506498755]
Differentiable rendering has paved the way to training neural networks to perform "inverse graphics" tasks.
We show that our approach significantly outperforms state-of-the-art inverse graphics networks trained on existing datasets.
arXiv Detail & Related papers (2020-10-18T22:29:07Z) - Self-Supervised Linear Motion Deblurring [112.75317069916579]
Deep convolutional neural networks are state-of-the-art for image deblurring.
We present a differentiable reblur model for self-supervised motion deblurring.
Our experiments demonstrate that self-supervised single image deblurring is really feasible.
arXiv Detail & Related papers (2020-02-10T20:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.