FreeInsert: Personalized Object Insertion with Geometric and Style Control
- URL: http://arxiv.org/abs/2509.20756v1
- Date: Thu, 25 Sep 2025 05:26:10 GMT
- Title: FreeInsert: Personalized Object Insertion with Geometric and Style Control
- Authors: Yuhong Zhang, Han Wang, Yiwen Wang, Rong Xie, Li Song,
- Abstract summary: We propose a training-free framework that customizes object insertion into arbitrary scenes by leveraging 3D geometric information.<n>The rendered image, serving as geometric control, is combined with style and content control achieved through diffusion adapters.
- Score: 26.088650452374726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image diffusion models have made significant progress in image generation, allowing for effortless customized generation. However, existing image editing methods still face certain limitations when dealing with personalized image composition tasks. First, there is the issue of lack of geometric control over the inserted objects. Current methods are confined to 2D space and typically rely on textual instructions, making it challenging to maintain precise geometric control over the objects. Second, there is the challenge of style consistency. Existing methods often overlook the style consistency between the inserted object and the background, resulting in a lack of realism. In addition, the challenge of inserting objects into images without extensive training remains significant. To address these issues, we propose \textit{FreeInsert}, a novel training-free framework that customizes object insertion into arbitrary scenes by leveraging 3D geometric information. Benefiting from the advances in existing 3D generation models, we first convert the 2D object into 3D, perform interactive editing at the 3D level, and then re-render it into a 2D image from a specified view. This process introduces geometric controls such as shape or view. The rendered image, serving as geometric control, is combined with style and content control achieved through diffusion adapters, ultimately producing geometrically controlled, style-consistent edited images via the diffusion model.
Related papers
- Ctrl&Shift: High-Quality Geometry-Aware Object Manipulation in Visual Generation [34.92056161129864]
We present Ctrl&Shift, an end-to-end diffusion framework to achieve geometry-consistent object manipulation without explicit 3D representations.<n>Our key insight is to decompose manipulation into two stages, object removal and reference-guided inpainting under explicit camera pose control, and encode both within a unified diffusion process.<n>To our knowledge, this is the first framework to unify fine-grained geometric control and real-world generalization for object manipulation, without relying on any explicit 3D modeling.
arXiv Detail & Related papers (2026-02-11T23:36:30Z) - POCI-Diff: Position Objects Consistently and Interactively with 3D-Layout Guided Diffusion [46.97254555348757]
We propose a diffusion-based approach for Text-to-Image (T2I) generation with consistent and interactive 3D layout control and editing.<n>We introduce a framework for Positioning Objects Consistently and Interactively (POCI-Diff)<n>Our method enables explicit per-object semantic control by binding individual text descriptions to specific 3D bounding boxes.
arXiv Detail & Related papers (2026-01-20T15:13:43Z) - 3D-LATTE: Latent Space 3D Editing from Textual Instructions [64.77718887666312]
We propose a training-free editing method that operates within the latent space of a native 3D diffusion model.<n>We guide the edit synthesis by blending 3D attention maps from the generation with the source object.
arXiv Detail & Related papers (2025-08-29T22:51:59Z) - Training-free Geometric Image Editing on Diffusion Models [53.38549950608886]
We tackle the task of geometric image editing, where an object within an image is repositioned, reoriented, or reshaped.<n>We propose a decoupled pipeline that separates object transformation, source region inpainting, and target region refinement.<n>Both inpainting and refinement are implemented using a training-free diffusion approach, FreeFine.
arXiv Detail & Related papers (2025-07-31T07:36:00Z) - 2D Instance Editing in 3D Space [39.53225056350435]
We introduce a novel "2D-3D-2D" framework for 2D image editing.<n>Our approach begins by lifting 2D objects into 3D representation, enabling edits within a physically plausible, rigidity-constrained 3D environment.<n>In contrast to existing 2D editing methods, such as DragGAN and DragDiffusion, our method directly manipulates objects in a 3D environment.
arXiv Detail & Related papers (2025-07-08T09:38:39Z) - LACONIC: A 3D Layout Adapter for Controllable Image Creation [22.96293773013579]
Existing generative approaches for guided image synthesis rely on 2D controls in the image or text space.<n>We propose a novel conditioning approach, training method and adapter network that can be plugged into pretrained text-to-image diffusion models.<n>Our method supports camera control, conditioning geometries on explicit 3D and, for the first time, accounts for the entire context of a scene.
arXiv Detail & Related papers (2025-07-04T02:25:36Z) - 3DSwapping: Texture Swapping For 3D Object From Single Reference Image [21.454340647455236]
3D texture swapping allows for the customization of 3D object textures.<n>No dedicated method exists, but adapted 2D editing and text-driven 3D editing approaches can serve this purpose.<n>We introduce 3DSwapping, a 3D texture swapping method that integrates progressive generation, view-consistency gradient guidance, and prompt-tuned gradient guidance.
arXiv Detail & Related papers (2025-03-24T16:31:52Z) - Image Sculpting: Precise Object Editing with 3D Geometry Control [33.9777412846583]
Image Sculpting is a new framework for editing 2D images by incorporating tools from 3D geometry and graphics.
It supports precise, quantifiable, and physically-plausible editing options such as pose editing, rotation, translation, 3D composition, carving, and serial addition.
arXiv Detail & Related papers (2024-01-02T18:59:35Z) - 3DAvatarGAN: Bridging Domains for Personalized Editable Avatars [75.31960120109106]
3D-GANs synthesize geometry and texture by training on large-scale datasets with a consistent structure.
We propose an adaptation framework, where the source domain is a pre-trained 3D-GAN, while the target domain is a 2D-GAN trained on artistic datasets.
We show a deformation-based technique for modeling exaggerated geometry of artistic domains, enabling -- as a byproduct -- personalized geometric editing.
arXiv Detail & Related papers (2023-01-06T19:58:47Z) - Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion [115.82306502822412]
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing.
A corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing.
We study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures.
arXiv Detail & Related papers (2022-12-14T18:49:50Z) - Cross-Modal 3D Shape Generation and Manipulation [62.50628361920725]
We propose a generic multi-modal generative model that couples the 2D modalities and implicit 3D representations through shared latent spaces.
We evaluate our framework on two representative 2D modalities of grayscale line sketches and rendered color images.
arXiv Detail & Related papers (2022-07-24T19:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.