A Controllable Appearance Representation for Flexible Transfer and Editing
- URL: http://arxiv.org/abs/2504.15028v1
- Date: Mon, 21 Apr 2025 11:29:06 GMT
- Title: A Controllable Appearance Representation for Flexible Transfer and Editing
- Authors: Santiago Jimenez-Navarro, Julia Guerrero-Viu, Belen Masia,
- Abstract summary: We present a method that computes an interpretable representation of material appearance within a compact latent space.<n>This representation is learned in a self-supervised fashion using an adapted FactorVAE.<n>Our model demonstrates strong disentanglement and interpretability by effectively encoding material appearance and illumination.
- Score: 0.44241702149260353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a method that computes an interpretable representation of material appearance within a highly compact, disentangled latent space. This representation is learned in a self-supervised fashion using an adapted FactorVAE. We train our model with a carefully designed unlabeled dataset, avoiding possible biases induced by human-generated labels. Our model demonstrates strong disentanglement and interpretability by effectively encoding material appearance and illumination, despite the absence of explicit supervision. Then, we use our representation as guidance for training a lightweight IP-Adapter to condition a diffusion pipeline that transfers the appearance of one or more images onto a target geometry, and allows the user to further edit the resulting appearance. Our approach offers fine-grained control over the generated results: thanks to the well-structured compact latent space, users can intuitively manipulate attributes such as hue or glossiness in image space to achieve the desired final appearance.
Related papers
- "Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.<n>Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z) - Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis in-the-Wild [29.23745176017559]
Exemplar-based semantic image synthesis generates images aligned with semantic content while preserving the appearance of an exemplar.<n>Recent tuning-free approaches address this by transferring local appearance via implicit cross-image matching.<n>We propose AM-Adapter to address exemplar-based semantic image synthesis in-the-wild.
arXiv Detail & Related papers (2024-12-04T09:17:47Z) - Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control [73.6361029556484]
Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs.
We consider pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts.
We show that Stable Control Representations enable learning policies that exhibit state-of-the-art performance on OVMM, a difficult open-vocabulary navigation benchmark.
arXiv Detail & Related papers (2024-05-09T15:39:54Z) - FilterPrompt: A Simple yet Efficient Approach to Guide Image Appearance Transfer in Diffusion Models [20.28288267660839]
FilterPrompt is an approach to enhance the effect of controllable generation.<n>It can be applied to any diffusion model, allowing users to adjust the representation of specific image features.
arXiv Detail & Related papers (2024-04-20T04:17:34Z) - Intrinsic Image Diffusion for Indoor Single-view Material Estimation [55.276815106443976]
We present Intrinsic Image Diffusion, a generative model for appearance decomposition of indoor scenes.
Given a single input view, we sample multiple possible material explanations represented as albedo, roughness, and metallic maps.
Our method produces significantly sharper, more consistent, and more detailed materials, outperforming state-of-the-art methods by $1.5dB$ on PSNR and by $45%$ better FID score on albedo prediction.
arXiv Detail & Related papers (2023-12-19T15:56:19Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - High-fidelity GAN Inversion with Padding Space [38.9258619444968]
Inverting a Generative Adversarial Network (GAN) facilitates a wide range of image editing tasks using pre-trained generators.
Existing methods typically employ the latent space of GANs as the inversion space yet observe the insufficient recovery of spatial details.
We propose to involve the padding space of the generator to complement the latent space with spatial information.
arXiv Detail & Related papers (2022-03-21T16:32:12Z) - Weakly But Deeply Supervised Occlusion-Reasoned Parametric Layouts [87.370534321618]
We propose an end-to-end network that takes a single perspective RGB image of a complex road scene as input, to produce occlusion-reasoned layouts in perspective space.
The only human annotations required by our method are for parametric attributes that are cheaper and less ambiguous to obtain.
We validate our approach on two public datasets, KITTI and NuScenes, to achieve state-of-the-art results with considerably lower human supervision.
arXiv Detail & Related papers (2021-04-14T09:32:29Z) - Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications.
We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.