Seeing a Rose in Five Thousand Ways
- URL: http://arxiv.org/abs/2212.04965v2
- Date: Mon, 20 May 2024 20:50:51 GMT
- Title: Seeing a Rose in Five Thousand Ways
- Authors: Yunzhi Zhang, Shangzhe Wu, Noah Snavely, Jiajun Wu,
- Abstract summary: A rose comprises its intrinsics, including the distribution of geometry, texture, and material specific to its object category.
We build a generative model that learns to capture such object intrinsics from a single image.
Our method achieves superior results on multiple downstream tasks, including intrinsic image decomposition, shape and image generation, view synthesis, and relighting.
- Score: 48.39141583352746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: What is a rose, visually? A rose comprises its intrinsics, including the distribution of geometry, texture, and material specific to its object category. With knowledge of these intrinsic properties, we may render roses of different sizes and shapes, in different poses, and under different lighting conditions. In this work, we build a generative model that learns to capture such object intrinsics from a single image, such as a photo of a bouquet. Such an image includes multiple instances of an object type. These instances all share the same intrinsics, but appear different due to a combination of variance within these intrinsics and differences in extrinsic factors, such as pose and illumination. Experiments show that our model successfully learns object intrinsics (distribution of geometry, texture, and material) for a wide range of objects, each from a single Internet image. Our method achieves superior results on multiple downstream tasks, including intrinsic image decomposition, shape and image generation, view synthesis, and relighting.
Related papers
- Compositional Image Decomposition with Diffusion Models [70.07406583580591]
In this paper, we present a method to decompose an image into such compositional components.
Our approach, Decomp Diffusion, is an unsupervised method which infers a set of different components in the image.
We demonstrate how components can capture different factors of the scene, ranging from global scene descriptors like shadows or facial expression to local scene descriptors like constituent objects.
arXiv Detail & Related papers (2024-06-27T16:13:34Z) - Are These the Same Apple? Comparing Images Based on Object Intrinsics [27.43687450076182]
Measure image similarity purely based on intrinsic object properties that define object identity.
This problem has been studied in the computer vision literature as re-identification.
We propose to extend it to general object categories, exploring an image similarity metric based on object intrinsics.
arXiv Detail & Related papers (2023-11-01T18:00:03Z) - Dual Pyramid Generative Adversarial Networks for Semantic Image
Synthesis [94.76988562653845]
The goal of semantic image synthesis is to generate photo-realistic images from semantic label maps.
Current state-of-the-art approaches, however, still struggle to generate realistic objects in images at various scales.
We propose a Dual Pyramid Generative Adversarial Network (DP-GAN) that learns the conditioning of spatially-adaptive normalization blocks at all scales jointly.
arXiv Detail & Related papers (2022-10-08T18:45:44Z) - IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering
in Indoor Scenes [99.76677232870192]
We show how a dense vision transformer, IRISformer, excels at both single-task and multi-task reasoning required for inverse rendering.
Specifically, we propose a transformer architecture to simultaneously estimate depths, normals, spatially-varying albedo, roughness and lighting from a single image of an indoor scene.
Our evaluations on benchmark datasets demonstrate state-of-the-art results on each of the above tasks, enabling applications like object insertion and material editing in a single unconstrained real image.
arXiv Detail & Related papers (2022-06-16T19:50:55Z) - Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with
Learned Morph Maps [94.10535575563092]
We introduce a generative adversarial network that can simultaneously generate aligned image samples from multiple related domains.
We propose Polymorphic-GAN which learns shared features across all domains and a per-domain morph layer to morph shared features according to each domain.
arXiv Detail & Related papers (2022-06-06T21:03:02Z) - Diversifying Semantic Image Synthesis and Editing via Class- and
Layer-wise VAEs [8.528384027684192]
We propose a class- and layer-wise extension to the variational autoencoder framework that allows flexible control over each object class at the local to global levels.
We demonstrate that our method generates images that are both plausible and more diverse compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-06-25T04:12:05Z) - GIRAFFE: Representing Scenes as Compositional Generative Neural Feature
Fields [45.21191307444531]
Deep generative models allow for photorealistic image synthesis at high resolutions.
But for many applications, this is not enough: content creation also needs to be controllable.
Our key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis.
arXiv Detail & Related papers (2020-11-24T14:14:15Z) - Self-supervised Single-view 3D Reconstruction via Semantic Consistency [142.71430568330172]
We learn a self-supervised, single-view 3D reconstruction model that predicts the shape, texture and camera pose of a target object.
The proposed method does not necessitate 3D supervision, manually annotated keypoints, multi-view images of an object or a prior 3D template.
arXiv Detail & Related papers (2020-03-13T20:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.