IntrinsiX: High-Quality PBR Generation using Image Priors
- URL: http://arxiv.org/abs/2504.01008v1
- Date: Tue, 01 Apr 2025 17:47:48 GMT
- Title: IntrinsiX: High-Quality PBR Generation using Image Priors
- Authors: Peter Kocsis, Lukas Höllein, Matthias Nießner,
- Abstract summary: We introduce IntrinsiX, a novel method that generates high-quality intrinsic images from text description.<n>In contrast to existing text-to-image models whose outputs contain baked-in scene lighting, our approach predicts physically-based rendering (PBR) maps.
- Score: 49.90007540430264
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce IntrinsiX, a novel method that generates high-quality intrinsic images from text description. In contrast to existing text-to-image models whose outputs contain baked-in scene lighting, our approach predicts physically-based rendering (PBR) maps. This enables the generated outputs to be used for content creation scenarios in core graphics applications that facilitate re-lighting, editing, and texture generation tasks. In order to train our generator, we exploit strong image priors, and pre-train separate models for each PBR material component (albedo, roughness, metallic, normals). We then align these models with a new cross-intrinsic attention formulation that concatenates key and value features in a consistent fashion. This allows us to exchange information between each output modality and to obtain semantically coherent PBR predictions. To ground each intrinsic component, we propose a rendering loss which provides image-space signals to constrain the model, thus facilitating sharp details also in the output BRDF properties. Our results demonstrate detailed intrinsic generation with strong generalization capabilities that outperforms existing intrinsic image decomposition methods used with generated images by a significant margin. Finally, we show a series of applications, including re-lighting, editing, and text-conditioned room-scale PBR texture generation.
Related papers
- ePBR: Extended PBR Materials in Image Synthesis [2.3241174970798126]
Intrinsic image representation offers a well-balanced trade-off, decomposing images into fundamental components.
We extend intrinsic image representations to incorporate both reflection and transmission properties.
With the Extended PBR (ePBR) Materials, we can effectively edit the materials with precise controls.
arXiv Detail & Related papers (2025-04-23T19:15:42Z) - PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling [43.00079951897522]
PRISM is a unified framework that enables multiple image generation and editing tasks in a single model.
It supports diverse tasks, including text-to-RGBX generation, RGB-to-X decomposition, and X-to-RGBX conditional generation.
arXiv Detail & Related papers (2025-04-19T08:03:06Z) - Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models [33.76031793753807]
We adapt the autoregressive multimodal model Lumina-mGPT into a robust Real-ISR model, namely PURE.<n>PURE Perceives and Understands the input low-quality image, then REstores its high-quality counterpart.<n> Experimental results demonstrate that PURE preserves image content while generating realistic details.
arXiv Detail & Related papers (2025-03-14T04:33:59Z) - MaterialMVP: Illumination-Invariant Material Generation via Multi-view PBR Diffusion [37.596740171045845]
Physically-based rendering (PBR) has become a cornerstone in modern computer graphics, enabling realistic material representation and lighting interactions in 3D scenes.<n>We present a novel end-to-end model for generating PBR textures from 3D meshes and image prompts, addressing key challenges in multi-view material synthesis.
arXiv Detail & Related papers (2025-03-13T11:57:30Z) - Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation [56.862552362223425]
This report presents a comprehensive framework for generating high-quality 3D shapes and textures from diverse input prompts.<n>The framework consists of 3D shape generation and texture generation.<n>This report details the system architecture, experimental results, and potential future directions to improve and expand the framework.
arXiv Detail & Related papers (2025-02-20T04:22:30Z) - MatCLIP: Light- and Shape-Insensitive Assignment of PBR Material Models [42.42328559042189]
MatCLIP is a novel method that extracts shape- and lighting-insensitive descriptors of PBR materials to assign plausible textures to 3D objects based on images.<n>By extending an Alpha-CLIP-based model on material renderings across diverse shapes and lighting, our approach generates descriptors that bridge the domains of PBR representations with photographs or renderings.<n>MatCLIP achieves a top-1 classification accuracy of 76.6%, outperforming state-of-the-art methods such as PhotoShape and MatAtlas.
arXiv Detail & Related papers (2025-01-27T12:08:52Z) - TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting [48.97819552366636]
This paper presents TexGaussian, a novel method that uses octant-aligned 3D Gaussian Splatting for rapid PBR material generation.
Our method synthesizes more visually pleasing PBR materials and runs faster than previous methods in both unconditional and text-conditional scenarios.
arXiv Detail & Related papers (2024-11-29T12:19:39Z) - Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models [20.19571676239579]
We introduce a novel diffusion-based framework to enhance the alignment of generated images with their corresponding descriptions.
Our framework is built upon a comprehensive analysis of inconsistency phenomena, categorizing them based on their manifestation in the image.
We then integrate a state-of-the-art controllable image generation model with a visual text generation module to generate an image that is consistent with the original prompt.
arXiv Detail & Related papers (2024-06-24T06:12:16Z) - MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation [54.64194935409982]
We introduce MuLAn: a novel dataset comprising over 44K MUlti-Layer-wise RGBA decompositions.
MuLAn is the first photorealistic resource providing instance decomposition and spatial information for high quality images.
We aim to encourage the development of novel generation and editing technology, in particular layer-wise solutions.
arXiv Detail & Related papers (2024-04-03T14:58:00Z) - Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering [47.78392889256976]
Paint-it is a text-driven high-fidelity texture map synthesis method for 3D rendering.
Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS)
We show that DC-PBR inherently schedules the optimization curriculum according to texture frequency and naturally filters out the noisy signals from SDS.
arXiv Detail & Related papers (2023-12-18T17:17:08Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - Energy-Based Cross Attention for Bayesian Context Update in
Text-to-Image Diffusion Models [62.603753097900466]
We present a novel energy-based model (EBM) framework for adaptive context control by modeling the posterior of context vectors.
Specifically, we first formulate EBMs of latent image representations and text embeddings in each cross-attention layer of the denoising autoencoder.
Our latent EBMs further allow zero-shot compositional generation as a linear combination of cross-attention outputs from different contexts.
arXiv Detail & Related papers (2023-06-16T14:30:41Z) - Light Sampling Field and BRDF Representation for Physically-based Neural
Rendering [4.440848173589799]
Physically-based rendering (PBR) is key for immersive rendering effects used widely in the industry to showcase detailed realistic scenes from computer graphics assets.
This paper proposes a novel lighting representation that models direct and indirect light locally through a light sampling strategy in a learned light sampling field.
We then implement our proposed representations with an end-to-end physically-based neural face skin shader, which takes a standard face asset and an HDRI for illumination as inputs and generates a photo-realistic rendering as output.
arXiv Detail & Related papers (2023-04-11T19:54:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.