Text-guided mask-free local image retouching
- URL: http://arxiv.org/abs/2212.07603v1
- Date: Thu, 15 Dec 2022 03:26:53 GMT
- Title: Text-guided mask-free local image retouching
- Authors: Zerun Liu, Fan Zhang, Jingxuan He, Jin Wang, Zhangye Wang, Lechao
Cheng
- Abstract summary: In this paper, we offer a text-guided mask-free image retouching approach.
Our technique can construct plausible and edge-sharp masks based on the text for each object in the image.
Experiments have shown that our method can produce high-quality, accurate images based on spoken language.
- Score: 12.472600455430769
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the realm of multi-modality, text-guided image retouching techniques
emerged with the advent of deep learning. Most currently available text-guided
methods, however, rely on object-level supervision to constrain the region that
may be modified. This not only makes it more challenging to develop these
algorithms, but it also limits how widely deep learning can be used for image
retouching. In this paper, we offer a text-guided mask-free image retouching
approach that yields consistent results to address this concern. In order to
perform image retouching without mask supervision, our technique can construct
plausible and edge-sharp masks based on the text for each object in the image.
Extensive experiments have shown that our method can produce high-quality,
accurate images based on spoken language. The source code will be released
soon.
Related papers
- Shallow- and Deep-fake Image Manipulation Localization Using Vision Mamba and Guided Graph Neural Network [8.518945405991362]
This paper explores the feasibility of using a deep learning network to localize manipulations in both shallow- and deep-fake images.<n>We propose a novel Guided Graph Neural Network (G-GNN) module that amplifies the distinction between manipulated and authentic pixels.
arXiv Detail & Related papers (2026-01-05T21:38:50Z) - DiffSTR: Controlled Diffusion Models for Scene Text Removal [5.790630195329777]
Scene Text Removal (STR) aims to prevent unauthorized use of text in images.
STR faces several challenges, including boundary artifacts, inconsistent texture and color, and preserving correct shadows.
We introduce a ControlNet diffusion model, treating STR as an inpainting task.
We develop a mask pretraining pipeline to condition our diffusion model.
arXiv Detail & Related papers (2024-10-29T04:20:21Z) - A Novel Framework For Text Detection From Natural Scene Images With Complex Background [0.0]
We propose a novel and efficient method to detect text region from images with complex background using Wavelet Transforms.
The framework uses Wavelet Transformation of the original image in its grayscale form followed by Sub-band filtering.
arXiv Detail & Related papers (2024-09-15T07:12:33Z) - Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis [63.757624792753205]
We present Zero-Painter, a framework for layout-conditional text-to-image synthesis.
Our method utilizes object masks and individual descriptions, coupled with a global text prompt, to generate images with high fidelity.
arXiv Detail & Related papers (2024-06-06T13:02:00Z) - Text-Driven Image Editing via Learnable Regions [74.45313434129005]
We introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches.
We show that this simple approach enables flexible editing that is compatible with current image generation models.
Experiments demonstrate the competitive performance of our method in manipulating images with high fidelity and realism that correspond to the provided language descriptions.
arXiv Detail & Related papers (2023-11-28T02:27:31Z) - StrucTexTv2: Masked Visual-Textual Prediction for Document Image
Pre-training [64.37272287179661]
StrucTexTv2 is an effective document image pre-training framework.
It consists of two self-supervised pre-training tasks: masked image modeling and masked language modeling.
It achieves competitive or even new state-of-the-art performance in various downstream tasks such as image classification, layout analysis, table structure recognition, document OCR, and information extraction.
arXiv Detail & Related papers (2023-03-01T07:32:51Z) - Semantic-guided Multi-Mask Image Harmonization [10.27974860479791]
We propose a new semantic-guided multi-mask image harmonization task.
In this work, we propose a novel way to edit the inharmonious images by predicting a series of operator masks.
arXiv Detail & Related papers (2022-07-24T11:48:49Z) - FlexIT: Towards Flexible Semantic Image Translation [59.09398209706869]
We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing.
First, FlexIT combines the input image and text into a single target point in the CLIP multimodal embedding space.
We iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.
arXiv Detail & Related papers (2022-03-09T13:34:38Z) - Free-Form Image Inpainting via Contrastive Attention Network [64.05544199212831]
In image inpainting tasks, masks with any shapes can appear anywhere in images which form complex patterns.
It is difficult for encoders to capture such powerful representations under this complex situation.
We propose a self-supervised Siamese inference network to improve the robustness and generalization.
arXiv Detail & Related papers (2020-10-29T14:46:05Z) - Text as Neural Operator: Image Manipulation by Text Instruction [68.53181621741632]
In this paper, we study a setting that allows users to edit an image with multiple objects using complex text instructions to add, remove, or change the objects.
The inputs of the task are multimodal including (1) a reference image and (2) an instruction in natural language that describes desired modifications to the image.
We show that the proposed model performs favorably against recent strong baselines on three public datasets.
arXiv Detail & Related papers (2020-08-11T07:07:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.