Related papers: LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering

LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering

URL: http://arxiv.org/abs/2508.07647v1
Date: Mon, 11 Aug 2025 05:57:59 GMT
Title: LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering
Authors: Xiaohang Zhan, Dingming Liu,
Abstract summary: We propose a novel training-free image generation algorithm that precisely controls the occlusion relationships between objects in an image.<n>We demonstrate that our method can achieve a variety of effects, such as altering the transparency of objects, the density of mass, and the intensity of light.
Score: 10.476519949850118
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel training-free image generation algorithm that precisely controls the occlusion relationships between objects in an image. Existing image generation methods typically rely on prompts to influence occlusion, which often lack precision. While layout-to-image methods provide control over object locations, they fail to address occlusion relationships explicitly. Given a pre-trained image diffusion model, our method leverages volume rendering principles to "render" the scene in latent space, guided by occlusion relationships and the estimated transmittance of objects. This approach does not require retraining or fine-tuning the image diffusion model, yet it enables accurate occlusion control due to its physics-grounded foundation. In extensive experiments, our method significantly outperforms existing approaches in terms of occlusion accuracy. Furthermore, we demonstrate that by adjusting the opacities of objects or concepts during rendering, our method can achieve a variety of effects, such as altering the transparency of objects, the density of mass (e.g., forests), the concentration of particles (e.g., rain, fog), the intensity of light, and the strength of lens effects, etc.

Related papers

Fine-grained Defocus Blur Control for Generative Image Models [66.30016220484394]
Current text-to-image diffusion models excel at generating diverse, high-quality images.<n>We introduce a novel text-to-image diffusion framework that leverages camera metadata.<n>Our model enables superior fine-grained control without altering the depicted scene.
arXiv Detail & Related papers (2025-10-07T17:59:15Z)
BokehDiff: Neural Lens Blur with One-Step Diffusion [53.11429878683807]
We introduce BokehDiff, a lens blur rendering method that achieves physically accurate and visually appealing outcomes.<n>Our method employs a physics-inspired self-attention module that aligns with the image formation process.<n>We adapt the diffusion model to the one-step inference scheme without introducing additional noise, and achieve results of high quality and fidelity.
arXiv Detail & Related papers (2025-07-24T03:23:19Z)
D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition [13.854486943187565]
This paper proposes a pipeline that utilizes a frozen diffusion model.<n>We hypothesize that such features can help hallucinate object visual features behind occluding objects.<n>We demonstrate that our proposed use of diffusion-based features results in models that are more robust to partial object occlusions.
arXiv Detail & Related papers (2025-04-08T21:05:29Z)
Materialist: Physically Based Editing Using Single-Image Inverse Rendering [47.85234717907478]
Materialist is a method combining a learning-based approach with physically based progressive differentiable rendering.<n>Our approach enables a range of applications, including material editing, object insertion, and relighting.<n> Experiments demonstrate strong performance across synthetic and real-world datasets.
arXiv Detail & Related papers (2025-01-07T11:52:01Z)
Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition.<n>It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects.<n>Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z)
DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task. We first apply attention masking in each denoising step to make the generation more disentangled across different objects. In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z)
CbwLoss: Constrained Bidirectional Weighted Loss for Self-supervised Learning of Depth and Pose [13.581694284209885]
Photometric differences are used to train neural networks for estimating depth and camera pose from unlabeled monocular videos. In this paper, we deal with moving objects and occlusions utilizing the difference of the flow fields and depth structure generated by affine transformation and view synthesis. We mitigate the effect of textureless regions on model optimization by measuring differences between features with more semantic and contextual information without adding networks.
arXiv Detail & Related papers (2022-12-12T12:18:24Z)
RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation [110.4255414234771]
Existing solutions require massive training data or lack generalizability to unknown rendering configurations. We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem. Our approach achieves significantly lower reconstruction errors and has better generalizability among unknown rendering configurations.
arXiv Detail & Related papers (2022-05-11T17:59:51Z)
DIB-R++: Learning to Predict Lighting and Material with a Hybrid Differentiable Renderer [78.91753256634453]
We consider the challenging problem of predicting intrinsic object properties from a single image by exploiting differentiables. In this work, we propose DIBR++, a hybrid differentiable which supports these effects by combining specularization and ray-tracing. Compared to more advanced physics-based differentiables, DIBR++ is highly performant due to its compact and expressive model.
arXiv Detail & Related papers (2021-10-30T01:59:39Z)
Unsupervised Learning of Depth and Depth-of-Field Effect from Natural Images with Aperture Rendering Generative Adversarial Networks [15.546533383799309]
We propose aperture rendering generative adversarial networks (AR-GANs), which equip aperture rendering on top of GANs, and adopt focus cues to learn the depth and depth-of-field effect of unlabeled natural images. In the experiments, we demonstrate the effectiveness of AR-GANs in various datasets, such as flower, bird, and face images, demonstrate their portability by incorporating them into other 3D representation learning GANs, and validate their applicability in shallow DoF rendering.
arXiv Detail & Related papers (2021-06-24T14:15:50Z)
FakeMix Augmentation Improves Transparent Object Detection [24.540569928274984]
We propose a novel content-dependent data augmentation method termed FakeMix to overcome the boundary-related imbalance problem. We also present AdaptiveASPP, an enhanced version of ASPP, that can capture multi-scale and cross-modality features dynamically.
arXiv Detail & Related papers (2021-03-24T15:51:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.