General Image-to-Image Translation with One-Shot Image Guidance
- URL: http://arxiv.org/abs/2307.14352v3
- Date: Wed, 20 Sep 2023 08:51:50 GMT
- Title: General Image-to-Image Translation with One-Shot Image Guidance
- Authors: Bin Cheng, Zuhao Liu, Yunbo Peng, Yue Lin
- Abstract summary: We propose a novel framework named visual concept translator (VCT)
It has the ability to preserve content in the source image and translate the visual concepts guided by a single reference image.
Given only one reference image, the proposed VCT can complete a wide range of general image-to-image translation tasks with excellent results.
- Score: 5.89808526053682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale text-to-image models pre-trained on massive text-image pairs show
excellent performance in image synthesis recently. However, image can provide
more intuitive visual concepts than plain text. People may ask: how can we
integrate the desired visual concept into an existing image, such as our
portrait? Current methods are inadequate in meeting this demand as they lack
the ability to preserve content or translate visual concepts effectively.
Inspired by this, we propose a novel framework named visual concept translator
(VCT) with the ability to preserve content in the source image and translate
the visual concepts guided by a single reference image. The proposed VCT
contains a content-concept inversion (CCI) process to extract contents and
concepts, and a content-concept fusion (CCF) process to gather the extracted
information to obtain the target image. Given only one reference image, the
proposed VCT can complete a wide range of general image-to-image translation
tasks with excellent results. Extensive experiments are conducted to prove the
superiority and effectiveness of the proposed methods. Codes are available at
https://github.com/CrystalNeuro/visual-concept-translator.
Related papers
- Exploiting Text-Image Latent Spaces for the Description of Visual Concepts [13.287533148600248]
Concept Activation Vectors (CAVs) offer insights into neural network decision-making by linking human friendly concepts to the model's internal feature extraction process.
When a new set of CAVs is discovered, they must still be translated into a human understandable description.
We propose an approach to aid the interpretation of newly discovered concept sets by suggesting textual descriptions for each CAV.
arXiv Detail & Related papers (2024-10-23T12:51:07Z) - FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition [49.2208591663092]
FreeCustom is a tuning-free method to generate customized images of multi-concept composition based on reference concepts.
We introduce a new multi-reference self-attention (MRSA) mechanism and a weighted mask strategy.
Our method outperforms or performs on par with other training-based methods in terms of multi-concept composition and single-concept customization.
arXiv Detail & Related papers (2024-05-22T17:53:38Z) - An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance [53.974497865647336]
We take a first step towards translating images to make them culturally relevant.
We build three pipelines comprising state-of-the-art generative models to do the task.
We conduct a human evaluation of translated images to assess for cultural relevance and meaning preservation.
arXiv Detail & Related papers (2024-04-01T17:08:50Z) - Visual Concept-driven Image Generation with Text-to-Image Diffusion Model [65.96212844602866]
Text-to-image (TTI) models have demonstrated impressive results in generating high-resolution images of complex scenes.
Recent approaches have extended these methods with personalization techniques that allow them to integrate user-illustrated concepts.
However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive.
We propose a concept-driven TTI personalization framework that addresses these core challenges.
arXiv Detail & Related papers (2024-02-18T07:28:37Z) - Textual Localization: Decomposing Multi-concept Images for
Subject-Driven Text-to-Image Generation [5.107886283951882]
We introduce a localized text-to-image model to handle multi-concept input images.
Our method incorporates a novel cross-attention guidance to decompose multiple concepts.
Notably, our method generates cross-attention maps consistent with the target concept in the generated images.
arXiv Detail & Related papers (2024-02-15T14:19:42Z) - Language-Informed Visual Concept Learning [22.911347501969857]
We train a set of concept encoders to encode the information pertinent to a set of language-informed concept axes.
We then anchor the concept embeddings to a set of text embeddings obtained from a pre-trained Visual Question Answering (VQA) model.
At inference time, the model extracts concept embeddings along various axes from new test images, which can be remixed to generate images with novel compositions of visual concepts.
arXiv Detail & Related papers (2023-12-06T16:24:47Z) - Advancing Visual Grounding with Scene Knowledge: Benchmark and Method [74.72663425217522]
Visual grounding (VG) aims to establish fine-grained alignment between vision and language.
Most existing VG datasets are constructed using simple description texts.
We propose a novel benchmark of underlineScene underlineKnowledge-guided underlineVisual underlineGrounding.
arXiv Detail & Related papers (2023-07-21T13:06:02Z) - RefineCap: Concept-Aware Refinement for Image Captioning [34.35093893441625]
We propose a novel model, termed RefineCap, that refines the output vocabulary of the language decoder using decoder-guided visual semantics.
Our model achieves superior performance on the MS-COCO dataset in comparison with previous visual-concept based models.
arXiv Detail & Related papers (2021-09-08T10:12:14Z) - TCIC: Theme Concepts Learning Cross Language and Vision for Image
Captioning [50.30918954390918]
We propose a Theme Concepts extended Image Captioning framework that incorporates theme concepts to represent high-level cross-modality semantics.
Considering that theme concepts can be learned from both images and captions, we propose two settings for their representations learning based on TTN.
arXiv Detail & Related papers (2021-06-21T09:12:55Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.