SAT3D: Image-driven Semantic Attribute Transfer in 3D
- URL: http://arxiv.org/abs/2408.01664v1
- Date: Sat, 3 Aug 2024 04:41:46 GMT
- Title: SAT3D: Image-driven Semantic Attribute Transfer in 3D
- Authors: Zhijun Zhai, Zengmao Wang, Xiaoxiao Long, Kaixuan Zhou, Bo Du,
- Abstract summary: We propose an image-driven Semantic Attribute Transfer method in 3D (SAT3D) by editing semantic attributes from a reference image.
For guidance, we associate each attribute with a set of phrase-based descriptor groups, and develop a Quantitative Measurement Module (QMM)
We present our 3D-aware attribute transfer results across multiple domains and also conduct comparisons with classical 2D image editing methods.
- Score: 31.087615253643975
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: GAN-based image editing task aims at manipulating image attributes in the latent space of generative models. Most of the previous 2D and 3D-aware approaches mainly focus on editing attributes in images with ambiguous semantics or regions from a reference image, which fail to achieve photographic semantic attribute transfer, such as the beard from a photo of a man. In this paper, we propose an image-driven Semantic Attribute Transfer method in 3D (SAT3D) by editing semantic attributes from a reference image. For the proposed method, the exploration is conducted in the style space of a pre-trained 3D-aware StyleGAN-based generator by learning the correlations between semantic attributes and style code channels. For guidance, we associate each attribute with a set of phrase-based descriptor groups, and develop a Quantitative Measurement Module (QMM) to quantitatively describe the attribute characteristics in images based on descriptor groups, which leverages the image-text comprehension capability of CLIP. During the training process, the QMM is incorporated into attribute losses to calculate attribute similarity between images, guiding target semantic transferring and irrelevant semantics preserving. We present our 3D-aware attribute transfer results across multiple domains and also conduct comparisons with classical 2D image editing methods, demonstrating the effectiveness and customizability of our SAT3D.
Related papers
- Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning [40.6806832534633]
We propose an efficient, plug-and-play, 3D-aware face editing framework based on attribute-specific prompt learning.
Our proposed framework generates high-quality images with 3D awareness and view consistency while maintaining attribute-specific features.
arXiv Detail & Related papers (2024-06-06T18:01:30Z) - Improving Generalization of Image Captioning with Unsupervised Prompt
Learning [63.26197177542422]
Generalization of Image Captioning (GeneIC) learns a domain-specific prompt vector for the target domain without requiring annotated data.
GeneIC aligns visual and language modalities with a pre-trained Contrastive Language-Image Pre-Training (CLIP) model.
arXiv Detail & Related papers (2023-08-05T12:27:01Z) - Semantic Validation in Structure from Motion [0.0]
Structure from Motion (SfM) is the process of recovering the 3D structure of a scene from a series of projective measurements.
SfM consists of three main steps; feature detection and matching, camera motion estimation, and recovery of 3D structure.
This project offers a novel method for improved validation of 3D SfM models.
arXiv Detail & Related papers (2023-04-05T12:58:59Z) - Self-Supervised Image Representation Learning with Geometric Set
Consistency [50.12720780102395]
We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency.
Specifically, we introduce 3D geometric consistency into a contrastive learning framework to enforce the feature consistency within image views.
arXiv Detail & Related papers (2022-03-29T08:57:33Z) - Hybrid Routing Transformer for Zero-Shot Learning [83.64532548391]
This paper presents a novel transformer encoder-decoder model, called hybrid routing transformer (HRT)
We embed an active attention, which is constructed by both the bottom-up and the top-down dynamic routing pathways to generate the attribute-aligned visual feature.
While in HRT decoder, we use static routing to calculate the correlation among the attribute-aligned visual features, the corresponding attribute semantics, and the class attribute vectors to generate the final class label predictions.
arXiv Detail & Related papers (2022-03-29T07:55:08Z) - 3D Shape Reconstruction from 2D Images with Disentangled Attribute Flow [61.62796058294777]
Reconstructing 3D shape from a single 2D image is a challenging task.
Most of the previous methods still struggle to extract semantic attributes for 3D reconstruction task.
We propose 3DAttriFlow to disentangle and extract semantic attributes through different semantic levels in the input images.
arXiv Detail & Related papers (2022-03-29T02:03:31Z) - SMILE: Semantically-guided Multi-attribute Image and Layout Editing [154.69452301122175]
Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs)
We present a multimodal representation that handles all attributes, be it guided by random noise or images, while only using the underlying domain information of the target domain.
Our method is capable of adding, removing or changing either fine-grained or coarse attributes by using an image as a reference or by exploring the style distribution space.
arXiv Detail & Related papers (2020-10-05T20:15:21Z) - MulGAN: Facial Attribute Editing by Exemplar [2.272764591035106]
Methods encode attribute-related information in images into the predefined region of the latent feature space by employing a pair of images with opposite attributes as input to train model.
They suffer from three limitations: (1) the model must be trained using a pair of images with opposite attributes as input; (2) weak capability of editing multiple attributes by exemplars; and (3) poor quality of generating image.
arXiv Detail & Related papers (2019-12-28T04:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.