Related papers: 3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing

3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing

URL: http://arxiv.org/abs/2311.12050v5
Date: Tue, 23 Jul 2024 04:01:08 GMT
Title: 3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing
Authors: Haoran Li, Long Ma, Haolin Shi, Yanbin Hao, Yong Liao, Lechao Cheng, Pengyuan Zhou,
Abstract summary: We propose a 3D editing framework, 3D-GOI, to enable multifaceted editing of affine information on multiple objects. 3D-GOI realizes the complex editing function by inverting the abundance of attribute codes controlled by GIRAFFE, a renowned 3D GAN.
Score: 25.442924637806676
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The current GAN inversion methods typically can only edit the appearance and shape of a single object and background while overlooking spatial information. In this work, we propose a 3D editing framework, 3D-GOI, to enable multifaceted editing of affine information (scale, translation, and rotation) on multiple objects. 3D-GOI realizes the complex editing function by inverting the abundance of attribute codes (object shape/appearance/scale/rotation/translation, background shape/appearance, and camera pose) controlled by GIRAFFE, a renowned 3D GAN. Accurately inverting all the codes is challenging, 3D-GOI solves this challenge following three main steps. First, we segment the objects and the background in a multi-object image. Second, we use a custom Neural Inversion Encoder to obtain coarse codes of each object. Finally, we use a round-robin optimization algorithm to get precise codes to reconstruct the image. To the best of our knowledge, 3D-GOI is the first framework to enable multifaceted editing on multiple objects. Both qualitative and quantitative experiments demonstrate that 3D-GOI holds immense potential for flexible, multifaceted editing in complex multi-object scenes.Our project and code are released at https://3d-goi.github.io .

Related papers

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration [29.657854912416038]
We introduce a generic AI system, namely MUSES, for 3D-controllable image generation from user queries. By mimicking the collaboration of human professionals, this multi-modal agent pipeline facilitates the effective and automatic creation of images with 3D-controllable objects. We construct a new benchmark of T2I-3DisBench (3D image scene), which describes diverse 3D image scenes with 50 detailed prompts.
arXiv Detail & Related papers (2024-08-20T07:37:23Z)
Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images [72.70883914827687]
Tailor3D is a novel pipeline that creates customized 3D assets from editable dual-side images. It provides a user-friendly, efficient solution for editing 3D assets, with each editing step taking only seconds to complete.
arXiv Detail & Related papers (2024-07-08T17:59:55Z)
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting [100.94916668527544]
Existing methods solely focus on either 2D individual object or 3D global scene editing. We propose 3DitScene, a novel and unified scene editing framework. It enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects.
arXiv Detail & Related papers (2024-05-28T17:59:01Z)
DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation [57.406031264184584]
DragGaussian is a 3D object drag-editing framework based on 3D Gaussian Splatting. Our contributions include the introduction of a new task, the development of DragGaussian for interactive point-based 3D editing, and comprehensive validation of its effectiveness through qualitative and quantitative experiments.
arXiv Detail & Related papers (2024-05-09T14:34:05Z)
DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing [72.54566271694654]
We consider the problem of editing 3D objects and scenes based on open-ended language instructions. A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process. This process is often inefficient due to the need for iterative updates of costly 3D representations.
arXiv Detail & Related papers (2024-04-29T17:59:30Z)
Designing a 3D-Aware StyleNeRF Encoder for Face Editing [15.303426697795143]
We propose a 3D-aware encoder for GAN inversion and face editing based on the powerful StyleNeRF model. Our proposed 3Da encoder combines a parametric 3D face model with a learnable detail representation model to generate geometry, texture and view direction codes.
arXiv Detail & Related papers (2023-02-19T03:32:28Z)
Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion [115.82306502822412]
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing. A corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing. We study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures.
arXiv Detail & Related papers (2022-12-14T18:49:50Z)
ONeRF: Unsupervised 3D Object Segmentation from Multiple Views [59.445957699136564]
ONeRF is a method that automatically segments and reconstructs object instances in 3D from multi-view RGB images without any additional manual annotations. The segmented 3D objects are represented using separate Neural Radiance Fields (NeRFs) which allow for various 3D scene editing and novel view rendering.
arXiv Detail & Related papers (2022-11-22T06:19:37Z)
CoReNet: Coherent 3D scene reconstruction from a single RGB image [43.74240268086773]
We build on advances in deep learning to reconstruct the shape of a single object given only one RBG image as input. We propose three extensions: (1) ray-traced skip connections that propagate local 2D information to the output 3D volume in a physically correct manner; (2) a hybrid 3D volume representation that enables building translation equivariant models; and (3) a reconstruction loss tailored to capture overall object geometry. We reconstruct all objects jointly in one pass, producing a coherent reconstruction, where all objects live in a single consistent 3D coordinate frame relative to the camera and they do not intersect in 3D space.
arXiv Detail & Related papers (2020-04-27T17:53:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.