Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion
- URL: http://arxiv.org/abs/2412.15050v3
- Date: Tue, 28 Jan 2025 14:33:42 GMT
- Title: Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion
- Authors: Zhifei Chen, Tianshuo Xu, Wenhang Ge, Leyi Wu, Dongyu Yan, Jing He, Luozhou Wang, Lu Zeng, Shunsi Zhang, Yingcong Chen, Hui Xiong,
- Abstract summary: Rendering and inverse rendering are pivotal tasks in computer vision and graphics.
We propose a data-driven method that jointly models rendering and inverse rendering as two conditional generation tasks.
We will open-source our training and inference code to the public, fostering further research and development in this area.
- Score: 25.5139351758218
- License:
- Abstract: Rendering and inverse rendering are pivotal tasks in both computer vision and graphics. The rendering equation is the core of the two tasks, as an ideal conditional distribution transfer function from intrinsic properties to RGB images. Despite achieving promising results of existing rendering methods, they merely approximate the ideal estimation for a specific scene and come with a high computational cost. Additionally, the inverse conditional distribution transfer is intractable due to the inherent ambiguity. To address these challenges, we propose a data-driven method that jointly models rendering and inverse rendering as two conditional generation tasks within a single diffusion framework. Inspired by UniDiffuser, we utilize two distinct time schedules to model both tasks, and with a tailored dual streaming module, we achieve cross-conditioning of two pre-trained diffusion models. This unified approach, named Uni-Renderer, allows the two processes to facilitate each other through a cycle-consistent constrain, mitigating ambiguity by enforcing consistency between intrinsic properties and rendered images. Combined with a meticulously prepared dataset, our method effectively decomposition of intrinsic properties and demonstrates a strong capability to recognize changes during rendering. We will open-source our training and inference code to the public, fostering further research and development in this area.
Related papers
- BD-Diff: Generative Diffusion Model for Image Deblurring on Unknown Domains with Blur-Decoupled Learning [55.21345354747609]
BD-Diff is a generative-diffusion-based model designed to enhance deblurring performance on unknown domains.
We employ two Q-Formers as structural representations and blur patterns extractors separately.
We introduce a reconstruction task to make the structural features and blur patterns complementary.
arXiv Detail & Related papers (2025-02-03T17:00:40Z) - Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models [39.127620891450526]
We introduce a unified, versatile, diffusion-based framework, Diff-2-in-1, to handle both multi-modal data generation and dense visual perception.
We further enhance discriminative visual perception via multi-modal generation, by utilizing the denoising network to create multi-modal data that mirror the distribution of the original training set.
arXiv Detail & Related papers (2024-11-07T18:59:53Z) - Discrete Modeling via Boundary Conditional Diffusion Processes [29.95155303262501]
Previous approaches have suffered from the discrepancy between discrete data and continuous modeling.
We propose a two-step forward process that first estimates the boundary as a prior distribution.
We then rescales the forward trajectory to construct a boundary conditional diffusion model.
arXiv Detail & Related papers (2024-10-29T09:42:42Z) - Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition [57.74076383449153]
We propose a novel dual-stream framework for event stream-based pattern recognition via differentiated fusion, termed EFV++.
It models two common event representations simultaneously, i.e., event images and event voxels.
We achieve new state-of-the-art performance on the Bullying10k dataset, i.e., $90.51%$, which exceeds the second place by $+2.21%$.
arXiv Detail & Related papers (2024-06-27T02:32:46Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - How Image Generation Helps Visible-to-Infrared Person Re-Identification? [15.951145523749735]
Flow2Flow is a framework that can jointly achieve training sample expansion and cross-modality image generation for V2I person ReID.
For the purpose of identity alignment and modality alignment of generated images, we develop adversarial training strategies to train Flow2Flow.
Experimental results on SYSU-MM01 and RegDB demonstrate that both training sample expansion and cross-modality image generation can significantly improve V2I ReID accuracy.
arXiv Detail & Related papers (2022-10-04T13:09:29Z) - RISP: Rendering-Invariant State Predictor with Differentiable Simulation
and Rendering for Cross-Domain Parameter Estimation [110.4255414234771]
Existing solutions require massive training data or lack generalizability to unknown rendering configurations.
We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem.
Our approach achieves significantly lower reconstruction errors and has better generalizability among unknown rendering configurations.
arXiv Detail & Related papers (2022-05-11T17:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.