Controllable 3D Placement of Objects with Scene-Aware Diffusion Models
- URL: http://arxiv.org/abs/2506.21446v1
- Date: Thu, 26 Jun 2025 16:31:39 GMT
- Title: Controllable 3D Placement of Objects with Scene-Aware Diffusion Models
- Authors: Mohamed Omran, Dimitris Kalatzis, Jens Petersen, Amirhossein Habibian, Auke Wiggers,
- Abstract summary: We show that a carefully designed visual map, combined with coarse object masks, is sufficient for high quality object placement.<n>We show that fine location control can be combined with appearance control to place existing objects in precise locations in a scene.
- Score: 6.020146107338903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image editing approaches have become more powerful and flexible with the advent of powerful text-conditioned generative models. However, placing objects in an environment with a precise location and orientation still remains a challenge, as this typically requires carefully crafted inpainting masks or prompts. In this work, we show that a carefully designed visual map, combined with coarse object masks, is sufficient for high quality object placement. We design a conditioning signal that resolves ambiguities, while being flexible enough to allow for changing of shapes or object orientations. By building on an inpainting model, we leave the background intact by design, in contrast to methods that model objects and background jointly. We demonstrate the effectiveness of our method in the automotive setting, where we compare different conditioning signals in novel object placement tasks. These tasks are designed to measure edit quality not only in terms of appearance, but also in terms of pose and location accuracy, including cases that require non-trivial shape changes. Lastly, we show that fine location control can be combined with appearance control to place existing objects in precise locations in a scene.
Related papers
- ObjectMover: Generative Object Movement with Video Prior [69.75281888309017]
We present ObjectMover, a generative model that can perform object movement in challenging scenes.<n>We show that with this approach, our model is able to adjust to complex real-world scenarios.<n>We propose a multi-task learning strategy that enables training on real-world video data to improve the model generalization.
arXiv Detail & Related papers (2025-03-11T04:42:59Z) - Generative Location Modeling for Spatially Aware Object Insertion [35.62317512925592]
Generative models have become a powerful tool for image editing tasks, including object insertion.
In this paper, we focus on creating a location model dedicated to identifying realistic object locations.
Specifically, we train an autoregressive model that generates bounding box coordinates, conditioned on the background image and the desired object class.
This formulation allows to effectively handle sparse placement annotations and to incorporate implausible locations into a preference dataset by performing direct preference optimization.
arXiv Detail & Related papers (2024-10-17T14:00:41Z) - Thinking Outside the BBox: Unconstrained Generative Object Compositing [36.86960274923344]
We present a novel problem of unconstrained generative object compositing.
Our first-of-its-kind model is able to generate object effects such as shadows and reflections that go beyond the mask.
Our model outperforms existing object placement and compositing models in various quality metrics and user studies.
arXiv Detail & Related papers (2024-09-06T18:42:30Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Customizing Text-to-Image Diffusion with Object Viewpoint Control [53.621518249820745]
We introduce a new task -- enabling explicit control of the object viewpoint in the customization of text-to-image diffusion models.<n>This allows us to modify the custom object's properties and generate it in various background scenes via text prompts.<n>We propose to condition the diffusion process on the 3D object features rendered from the target viewpoint.
arXiv Detail & Related papers (2024-04-18T16:59:51Z) - RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting [63.567363455092234]
RefFusion is a novel 3D inpainting method based on a multi-scale personalization of an image inpainting diffusion model to the given reference view.
Our framework achieves state-of-the-art results for object removal while maintaining high controllability.
arXiv Detail & Related papers (2024-04-16T17:50:02Z) - VASE: Object-Centric Appearance and Shape Manipulation of Real Videos [108.60416277357712]
In this work, we introduce a framework that is object-centric and is designed to control both the object's appearance and, notably, to execute precise and explicit structural modifications on the object.
We build our framework on a pre-trained image-conditioned diffusion model, integrate layers to handle the temporal dimension, and propose training strategies and architectural modifications to enable shape control.
We evaluate our method on the image-driven video editing task showing similar performance to the state-of-the-art, and showcasing novel shape-editing capabilities.
arXiv Detail & Related papers (2024-01-04T18:59:24Z) - Scene-Conditional 3D Object Stylization and Composition [27.57166804668999]
3D generative models have made impressive progress, enabling the generation of almost arbitrary 3D assets from text or image inputs.<n>We propose a framework that allows for the stylization of an existing 3D asset to fit into a given 2D scene, and additionally produce a photorealistic composition as if the asset was placed within the environment.
arXiv Detail & Related papers (2023-12-19T18:50:33Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - Semantic-Guided Inpainting Network for Complex Urban Scenes Manipulation [19.657440527538547]
In this work, we propose a novel deep learning model to alter a complex urban scene by removing a user-specified portion of the image.
Inspired by recent works on image inpainting, our proposed method leverages the semantic segmentation to model the content and structure of the image.
To generate reliable results, we design a new decoder block that combines the semantic segmentation and generation task.
arXiv Detail & Related papers (2020-10-19T09:17:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.