Related papers: TexSliders: Diffusion-Based Texture Editing in CLIP Space

TexSliders: Diffusion-Based Texture Editing in CLIP Space

URL: http://arxiv.org/abs/2405.00672v1
Date: Wed, 1 May 2024 17:57:21 GMT
Title: TexSliders: Diffusion-Based Texture Editing in CLIP Space
Authors: Julia Guerrero-Viu, Milos Hasan, Arthur Roullier, Midhun Harikumar, Yiwei Hu, Paul Guerrero, Diego Gutierrez, Belen Masia, Valentin Deschaintre,
Abstract summary: We analyze existing editing methods and show that they are not directly applicable to textures. We propose a novel approach that instead manipulates CLIP image embeddings to condition the diffusion generation.
Score: 17.449209402077276
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative models have enabled intuitive image creation and manipulation using natural language. In particular, diffusion models have recently shown remarkable results for natural image editing. In this work, we propose to apply diffusion techniques to edit textures, a specific class of images that are an essential part of 3D content creation pipelines. We analyze existing editing methods and show that they are not directly applicable to textures, since their common underlying approach, manipulating attention maps, is unsuitable for the texture domain. To address this, we propose a novel approach that instead manipulates CLIP image embeddings to condition the diffusion generation. We define editing directions using simple text prompts (e.g., "aged wood" to "new wood") and map these to CLIP image embedding space using a texture prior, with a sampling-based approach that gives us identity-preserving directions in CLIP space. To further improve identity preservation, we project these directions to a CLIP subspace that minimizes identity variations resulting from entangled texture attributes. Our editing pipeline facilitates the creation of arbitrary sliders using natural language prompts only, with no ground-truth annotated data necessary.

Related papers

ScribbleSense: Generative Scribble-Based Texture Editing with Intent Prediction [5.109590115201006]
ScribbleSense is an editing method that combines multimodal large language models (MLLMs) and image generation models.<n>We leverage the visual capabilities of MLLMs to predict the editing intent behind the scribbles.<n>Globally generated images are employed to extract local texture details.
arXiv Detail & Related papers (2026-01-30T01:55:44Z)
Example-Based Feature Painting on Textures [7.130784822780051]
We introduce a novel approach for creating textures with appearance-altering features.<n>Our pipeline as a whole goes from a small image collection to a versatile generative model.<n> Notably, the algorithms we introduce for diffusion-based editing and infinite stationary texture generation are generic and should prove useful in other contexts as well.
arXiv Detail & Related papers (2025-11-03T12:26:50Z)
TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfer [32.53299128227546]
We propose TextureDiffusion, a tuning-free image editing method applied to various texture transfer. query features in self-attention and features in residual blocks are utilized to preserve the structure of the input image. To maintain the background, we introduce an edit localization technique which blends the self-attention results and the intermediate latents.
arXiv Detail & Related papers (2024-09-15T04:34:38Z)
ZONE: Zero-Shot Instruction-Guided Local Editing [56.56213730578504]
We propose a Zero-shot instructiON-guided local image Editing approach, termed ZONE. We first convert the editing intent from the user-provided instruction into specific image editing regions through InstructPix2Pix. We then propose a Region-IoU scheme for precise image layer extraction from an off-the-shelf segment model.
arXiv Detail & Related papers (2023-12-28T02:54:34Z)
Text-Driven Image Editing via Learnable Regions [74.45313434129005]
We introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches. We show that this simple approach enables flexible editing that is compatible with current image generation models. Experiments demonstrate the competitive performance of our method in manipulating images with high fidelity and realism that correspond to the provided language descriptions.
arXiv Detail & Related papers (2023-11-28T02:27:31Z)
iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing. It generates images conditioned on a source image and a textual edit prompt. It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z)
SKED: Sketch-guided Text-based 3D Editing [49.019881133348775]
We present SKED, a technique for editing 3D shapes represented by NeRFs. Our technique utilizes as few as two guiding sketches from different views to alter an existing neural field. We propose novel loss functions to generate the desired edits while preserving the density and radiance of the base instance.
arXiv Detail & Related papers (2023-03-19T18:40:44Z)
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing [104.27329655124299]
We propose FateZero, a zero-shot text-based editing method on real-world videos without per-prompt training or use-specific mask. Our method is the first one to show the ability of zero-shot text-driven video style and local attribute editing from the trained text-to-image model.
arXiv Detail & Related papers (2023-03-16T17:51:13Z)
DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing. Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z)
Text2LIVE: Text-Driven Layered Image and Video Editing [13.134513605107808]
We present a method for zero-shot, text-driven appearance manipulation in natural images and videos. Given an input image or video and a target text prompt, our goal is to edit the appearance of existing objects. We demonstrate localized, semantic edits on high-resolution natural images and videos across a variety of objects and scenes.
arXiv Detail & Related papers (2022-04-05T21:17:34Z)
Blended Diffusion for Text-driven Editing of Natural Images [18.664733153082146]
We introduce the first solution for performing local (region-based) edits in generic natural images. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP) To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent.
arXiv Detail & Related papers (2021-11-29T18:58:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.