Collaborative Control for Geometry-Conditioned PBR Image Generation
- URL: http://arxiv.org/abs/2402.05919v3
- Date: Fri, 23 Aug 2024 12:19:54 GMT
- Title: Collaborative Control for Geometry-Conditioned PBR Image Generation
- Authors: Shimon Vainer, Mark Boss, Mathias Parger, Konstantin Kutsy, Dante De Nigris, Ciara Rowles, Nicolas Perony, Simon Donné,
- Abstract summary: We propose to model the PBR image distribution directly, avoiding photometric inaccuracies in RGB generation.
We train a new PBR model that is tightly linked to a frozen RGB model using a novel cross-network communication paradigm.
- Score: 4.41000596260979
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Graphics pipelines require physically-based rendering (PBR) materials, yet current 3D content generation approaches are built on RGB models. We propose to model the PBR image distribution directly, avoiding photometric inaccuracies in RGB generation and the inherent ambiguity in extracting PBR from RGB. As existing paradigms for cross-modal fine-tuning are not suited for PBR generation due to both a lack of data and the high dimensionality of the output modalities, we propose to train a new PBR model that is tightly linked to a frozen RGB model using a novel cross-network communication paradigm. As the base RGB model is fully frozen, the proposed method retains its general performance and remains compatible with e.g. IPAdapters for that base model.
Related papers
- RAW-Flow: Advancing RGB-to-RAW Image Reconstruction with Deterministic Latent Flow Matching [55.03149221192589]
We introduce a novel framework named RAW-Flow to bridge the gap between RGB and RAW representations.<n>We also introduce a cross-scale context guidance module that injects hierarchical RGB features into the flow estimation process.<n> RAW-Flow outperforms state-of-the-art approaches both quantitatively and visually.
arXiv Detail & Related papers (2026-01-28T08:27:38Z) - MatPedia: A Universal Generative Foundation for High-Fidelity Material Synthesis [29.919740823136163]
MatPedia is a foundation model built upon a novel joint RGB-PBR representation.<n>It compactly encodes materials into two latents: one for RGB appearance and one for the four PBR maps.<n>Trained on MatHybrid-410K, a mixed corpus combining PBR datasets with large-scale RGB images, MatPedia achieves native $1024times1024$ synthesis.
arXiv Detail & Related papers (2025-11-21T05:16:26Z) - ThermalGen: Style-Disentangled Flow-Based Generative Models for RGB-to-Thermal Image Translation [14.108149959967095]
Paired RGB-thermal data is crucial for visual-thermal sensor fusion and cross-modality tasks.<n>To overcome this challenge, RGB-to-Thermal (RGB-T) image translation has emerged as a promising solution.<n>We propose ThermalGen, an adaptive flow-based generative model for RGB-T image translation.
arXiv Detail & Related papers (2025-09-29T14:55:51Z) - End-to-End RGB-IR Joint Image Compression With Channel-wise Cross-modality Entropy Model [39.52468600966148]
As the number of modalities increases, the required data storage and transmission costs also double.<n>This work proposes a joint compression framework for RGB-IR image pair.
arXiv Detail & Related papers (2025-06-27T02:04:21Z) - PBR-SR: Mesh PBR Texture Super Resolution from 2D Image Priors [52.28858915766172]
PBR-SR is a novel method for physically based rendering (PBR) texture super resolution (SR)<n>It outputs high-resolution, high-quality PBR textures from low-resolution (LR) PBR input in a zero-shot manner.
arXiv Detail & Related papers (2025-06-03T13:15:34Z) - IntrinsiX: High-Quality PBR Generation using Image Priors [49.90007540430264]
We introduce IntrinsiX, a novel method that generates high-quality intrinsic images from text description.
In contrast to existing text-to-image models whose outputs contain baked-in scene lighting, our approach predicts physically-based rendering (PBR) maps.
arXiv Detail & Related papers (2025-04-01T17:47:48Z) - VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition [54.27379947727035]
This paper proposes a novel PEFT strategy to adapt the pre-trained foundation vision models for the RGB-Event-based classification.
The frame difference of the dual modalities is also considered to capture the motion cues via the frame difference backbone network.
The source code and pre-trained models will be released on urlhttps://github.com/Event-AHU/VELoRA.
arXiv Detail & Related papers (2024-12-28T07:38:23Z) - UniRGB-IR: A Unified Framework for RGB-Infrared Semantic Tasks via Adapter Tuning [17.36726475620881]
We propose a general and efficient framework called UniRGB-IR to unify RGB-IR semantic tasks.
A novel adapter is developed to efficiently introduce richer RGB-IR features into the pre-trained foundation model.
Experimental results on various RGB-IR downstream tasks demonstrate that our method can achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-04-26T12:21:57Z) - EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion [55.367269556557645]
EvPlug learns a plug-and-play event and image fusion module from the supervision of the existing RGB-based model.
We demonstrate the superiority of EvPlug in several vision tasks such as object detection, semantic segmentation, and 3D hand pose estimation.
arXiv Detail & Related papers (2023-12-28T10:05:13Z) - Channel and Spatial Relation-Propagation Network for RGB-Thermal
Semantic Segmentation [10.344060599932185]
RGB-Thermal (RGB-T) semantic segmentation has shown great potential in handling low-light conditions.
The key to RGB-T semantic segmentation is to effectively leverage the complementarity nature of RGB and thermal images.
arXiv Detail & Related papers (2023-08-24T03:43:47Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - Mirror Complementary Transformer Network for RGB-thermal Salient Object
Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair.
In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD.
Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z) - High-Resolution Image Harmonization via Collaborative Dual
Transformations [13.9962809174055]
We propose a high-resolution image harmonization network with Collaborative Dual Transformation (CDTNet)
Our CDTNet consists of a low-resolution generator for pixel-to-pixel transformation, a color mapping module for RGB-to-RGB transformation, and a refinement module to take advantage of both.
arXiv Detail & Related papers (2021-09-14T13:18:58Z) - Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision [76.41657124981549]
This paper presents a joint learning model for image alignment and RAW-to-sRGB mapping.
Experiments show that our method performs favorably against state-of-the-arts on ZRR and SR-RAW datasets.
arXiv Detail & Related papers (2021-08-18T12:41:36Z) - BOP Challenge 2020 on 6D Object Localization [56.591561228575635]
The BOP Challenge 2020 is the third in a series of public competitions organized with the goal to capture the status quo in the field of 6D object pose estimation from an RGB-D image.
The participants were provided 350K training images generated by BlenderProc4BOP, a new open-source and light-weight physically-based (PBR) and procedural data generator.
The top-performing methods rely on RGB-D image channels, but strong results were achieved when only RGB channels were used at both training and test time.
arXiv Detail & Related papers (2020-09-15T22:35:14Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.