Seeing a Rose in Five Thousand Ways
- URL: http://arxiv.org/abs/2212.04965v2
- Date: Mon, 20 May 2024 20:50:51 GMT
- Title: Seeing a Rose in Five Thousand Ways
- Authors: Yunzhi Zhang, Shangzhe Wu, Noah Snavely, Jiajun Wu,
- Abstract summary: A rose comprises its intrinsics, including the distribution of geometry, texture, and material specific to its object category.
We build a generative model that learns to capture such object intrinsics from a single image.
Our method achieves superior results on multiple downstream tasks, including intrinsic image decomposition, shape and image generation, view synthesis, and relighting.
- Score: 48.39141583352746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: What is a rose, visually? A rose comprises its intrinsics, including the distribution of geometry, texture, and material specific to its object category. With knowledge of these intrinsic properties, we may render roses of different sizes and shapes, in different poses, and under different lighting conditions. In this work, we build a generative model that learns to capture such object intrinsics from a single image, such as a photo of a bouquet. Such an image includes multiple instances of an object type. These instances all share the same intrinsics, but appear different due to a combination of variance within these intrinsics and differences in extrinsic factors, such as pose and illumination. Experiments show that our model successfully learns object intrinsics (distribution of geometry, texture, and material) for a wide range of objects, each from a single Internet image. Our method achieves superior results on multiple downstream tasks, including intrinsic image decomposition, shape and image generation, view synthesis, and relighting.
Related papers
- Investigating Image Manifolds of 3D Objects: Learning, Shape Analysis, and Comparisons [9.326260051834822]
Despite high-dimensionality of images, the sets of images of 3D objects have long been hypothesized to form low-dimensional manifold.
This paper revisits a classical problem of manifold learning but from a novel geometrical perspective.
The geometries of image manifold can be exploited to simplify vision and image processing tasks, to predict performances, and to provide insights into learning methods.
arXiv Detail & Related papers (2025-03-09T21:00:33Z) - Synthesis and Perceptual Scaling of High Resolution Natural Images Using Stable Diffusion [0.0]
We develop a custom stimulus set of photorealistic images from six categories with 18 objects each.
For each object we generated 10 graded variants that are ordered along a perceptual continuum.
This image set is of interest for studies on visual perception, attention and short- and long-term memory.
arXiv Detail & Related papers (2024-10-16T20:49:19Z) - Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering [56.68286440268329]
correct insertion of virtual objects in images of real-world scenes requires a deep understanding of the scene's lighting, geometry and materials.
We propose using a personalized large diffusion model as guidance to a physically based inverse rendering process.
Our method recovers scene lighting and tone-mapping parameters, allowing the photorealistic composition of arbitrary virtual objects in single frames or videos of indoor or outdoor scenes.
arXiv Detail & Related papers (2024-08-19T05:15:45Z) - Compositional Image Decomposition with Diffusion Models [70.07406583580591]
In this paper, we present a method to decompose an image into such compositional components.
Our approach, Decomp Diffusion, is an unsupervised method which infers a set of different components in the image.
We demonstrate how components can capture different factors of the scene, ranging from global scene descriptors like shadows or facial expression to local scene descriptors like constituent objects.
arXiv Detail & Related papers (2024-06-27T16:13:34Z) - Matching Non-Identical Objects [4.520518890664213]
This study addresses a novel task of matching such non-identical objects at the pixel level.
We propose a weighting scheme of descriptors that incorporates semantic information from object detectors into existing sparse image matching methods.
arXiv Detail & Related papers (2024-03-13T04:11:38Z) - Are These the Same Apple? Comparing Images Based on Object Intrinsics [27.43687450076182]
Measure image similarity purely based on intrinsic object properties that define object identity.
This problem has been studied in the computer vision literature as re-identification.
We propose to extend it to general object categories, exploring an image similarity metric based on object intrinsics.
arXiv Detail & Related papers (2023-11-01T18:00:03Z) - Dual Pyramid Generative Adversarial Networks for Semantic Image
Synthesis [94.76988562653845]
The goal of semantic image synthesis is to generate photo-realistic images from semantic label maps.
Current state-of-the-art approaches, however, still struggle to generate realistic objects in images at various scales.
We propose a Dual Pyramid Generative Adversarial Network (DP-GAN) that learns the conditioning of spatially-adaptive normalization blocks at all scales jointly.
arXiv Detail & Related papers (2022-10-08T18:45:44Z) - IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering
in Indoor Scenes [99.76677232870192]
We show how a dense vision transformer, IRISformer, excels at both single-task and multi-task reasoning required for inverse rendering.
Specifically, we propose a transformer architecture to simultaneously estimate depths, normals, spatially-varying albedo, roughness and lighting from a single image of an indoor scene.
Our evaluations on benchmark datasets demonstrate state-of-the-art results on each of the above tasks, enabling applications like object insertion and material editing in a single unconstrained real image.
arXiv Detail & Related papers (2022-06-16T19:50:55Z) - Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with
Learned Morph Maps [94.10535575563092]
We introduce a generative adversarial network that can simultaneously generate aligned image samples from multiple related domains.
We propose Polymorphic-GAN which learns shared features across all domains and a per-domain morph layer to morph shared features according to each domain.
arXiv Detail & Related papers (2022-06-06T21:03:02Z) - Diversifying Semantic Image Synthesis and Editing via Class- and
Layer-wise VAEs [8.528384027684192]
We propose a class- and layer-wise extension to the variational autoencoder framework that allows flexible control over each object class at the local to global levels.
We demonstrate that our method generates images that are both plausible and more diverse compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-06-25T04:12:05Z) - GIRAFFE: Representing Scenes as Compositional Generative Neural Feature
Fields [45.21191307444531]
Deep generative models allow for photorealistic image synthesis at high resolutions.
But for many applications, this is not enough: content creation also needs to be controllable.
Our key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis.
arXiv Detail & Related papers (2020-11-24T14:14:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.