Multimodality-guided Image Style Transfer using Cross-modal GAN
Inversion
- URL: http://arxiv.org/abs/2312.01671v1
- Date: Mon, 4 Dec 2023 06:38:23 GMT
- Title: Multimodality-guided Image Style Transfer using Cross-modal GAN
Inversion
- Authors: Hanyu Wang, Pengxiang Wu, Kevin Dela Rosa, Chen Wang, Abhinav
Shrivastava
- Abstract summary: We present a novel method to achieve much improved style transfer based on text guidance.
Our method allows style inputs from multiple sources and modalities, enabling MultiModality-guided Image Style Transfer (MMIST)
Specifically, we realize MMIST with a novel cross-modal GAN inversion method, which generates style representations consistent with specified styles.
- Score: 42.345533741985626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image Style Transfer (IST) is an interdisciplinary topic of computer vision
and art that continuously attracts researchers' interests. Different from
traditional Image-guided Image Style Transfer (IIST) methods that require a
style reference image as input to define the desired style, recent works start
to tackle the problem in a text-guided manner, i.e., Text-guided Image Style
Transfer (TIST). Compared to IIST, such approaches provide more flexibility
with text-specified styles, which are useful in scenarios where the style is
hard to define with reference images. Unfortunately, many TIST approaches
produce undesirable artifacts in the transferred images. To address this issue,
we present a novel method to achieve much improved style transfer based on text
guidance. Meanwhile, to offer more flexibility than IIST and TIST, our method
allows style inputs from multiple sources and modalities, enabling
MultiModality-guided Image Style Transfer (MMIST). Specifically, we realize
MMIST with a novel cross-modal GAN inversion method, which generates style
representations consistent with specified styles. Such style representations
facilitate style transfer and in principle generalize any IIST methods to
MMIST. Large-scale experiments and user studies demonstrate that our method
achieves state-of-the-art performance on TIST task. Furthermore, comprehensive
qualitative results confirm the effectiveness of our method on MMIST task and
cross-modal style interpolation.
Related papers
- Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - Any-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate [58.83278629019384]
Style transfer aims to render the style of a given image for style reference to another given image for content reference.
Existing approaches either apply the holistic style of the style image in a global manner, or migrate local colors and textures of the style image to the content counterparts in a pre-defined way.
We propose Any-to-Any Style Transfer, which enables users to interactively select styles of regions in the style image and apply them to the prescribed content regions.
arXiv Detail & Related papers (2023-04-19T15:15:36Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - STALP: Style Transfer with Auxiliary Limited Pairing [36.23393954839379]
We present an approach to example-based stylization of images that uses a single pair of a source image and its stylized counterpart.
We demonstrate how to train an image translation network that can perform real-time semantically meaningful style transfer to a set of target images.
arXiv Detail & Related papers (2021-10-20T11:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.