StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval
- URL: http://arxiv.org/abs/2103.15706v2
- Date: Wed, 31 Mar 2021 10:31:24 GMT
- Title: StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval
- Authors: Aneeshan Sain, Ayan Kumar Bhunia, Yongxin Yang, Tao Xiang, Yi-Zhe Song
- Abstract summary: Crossmodal matching problem is typically solved by learning a joint embedding space where semantic content shared between photo and sketch modalities are preserved.
An effective model needs to explicitly account for this style diversity, crucially, to unseen user styles.
Our model can not only disentangle the cross-modal shared semantic content, but can adapt the disentanglement to any unseen user style as well, making the model truly agnostic.
- Score: 119.03470556503942
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sketch-based image retrieval (SBIR) is a cross-modal matching problem which
is typically solved by learning a joint embedding space where the semantic
content shared between photo and sketch modalities are preserved. However, a
fundamental challenge in SBIR has been largely ignored so far, that is,
sketches are drawn by humans and considerable style variations exist amongst
different users. An effective SBIR model needs to explicitly account for this
style diversity, crucially, to generalise to unseen user styles. To this end, a
novel style-agnostic SBIR model is proposed. Different from existing models, a
cross-modal variational autoencoder (VAE) is employed to explicitly disentangle
each sketch into a semantic content part shared with the corresponding photo,
and a style part unique to the sketcher. Importantly, to make our model
dynamically adaptable to any unseen user styles, we propose to meta-train our
cross-modal VAE by adding two style-adaptive components: a set of feature
transformation layers to its encoder and a regulariser to the disentangled
semantic content latent code. With this meta-learning framework, our model can
not only disentangle the cross-modal shared semantic content for SBIR, but can
adapt the disentanglement to any unseen user style as well, making the SBIR
model truly style-agnostic. Extensive experiments show that our style-agnostic
model yields state-of-the-art performance for both category-level and
instance-level SBIR.
Related papers
- ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models [42.45078883553856]
Stylized Text-to-Image Generation (STIG) aims to generate images based on text prompts and style reference images.
We in this paper propose a novel framework dubbed as StyleMaster for this task by leveraging pretrained Stable Diffusion.
Two objective functions are introduced to optimize the model together with denoising loss, which can further enhance semantic and style consistency.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - Customize StyleGAN with One Hand Sketch [0.0]
We propose a framework to control StyleGAN imagery with a single user sketch.
We learn a conditional distribution in the latent space of a pre-trained StyleGAN model via energy-based learning.
Our model can generate multi-modal images semantically aligned with the input sketch.
arXiv Detail & Related papers (2023-10-29T09:32:33Z) - MODIFY: Model-driven Face Stylization without Style Images [77.24793103549158]
Existing face stylization methods always acquire the presence of the target (style) domain during the translation process.
We propose a new method called MODel-drIven Face stYlization (MODIFY), which relies on the generative model to bypass the dependence of the target images.
Experimental results on several different datasets validate the effectiveness of MODIFY for unsupervised face stylization.
arXiv Detail & Related papers (2023-03-17T08:35:17Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - Adversarial Style Augmentation for Domain Generalized Urban-Scene
Segmentation [120.96012935286913]
We propose a novel adversarial style augmentation approach, which can generate hard stylized images during training.
Experiments on two synthetic-to-real semantic segmentation benchmarks demonstrate that AdvStyle can significantly improve the model performance on unseen real domains.
arXiv Detail & Related papers (2022-07-11T14:01:25Z) - Towards Controllable and Photorealistic Region-wise Image Manipulation [11.601157452472714]
We present a generative model with auto-encoder architecture for per-region style manipulation.
We apply a code consistency loss to enforce an explicit disentanglement between content and style latent representations.
The model is constrained by a content alignment loss to ensure the foreground editing will not interfere background contents.
arXiv Detail & Related papers (2021-08-19T13:29:45Z) - Anisotropic Stroke Control for Multiple Artists Style Transfer [36.92721585146738]
Stroke Control Multi-Artist Style Transfer framework is developed.
Anisotropic Stroke Module (ASM) endows the network with the ability of adaptive semantic-consistency among various styles.
In contrast to the single-scale conditional discriminator, our discriminator is able to capture multi-scale texture clue.
arXiv Detail & Related papers (2020-10-16T05:32:26Z) - Arbitrary Style Transfer via Multi-Adaptation Network [109.6765099732799]
A desired style transfer, given a content image and referenced style painting, would render the content image with the color tone and vivid stroke patterns of the style painting.
A new disentanglement loss function enables our network to extract main style patterns and exact content structures to adapt to various input images.
arXiv Detail & Related papers (2020-05-27T08:00:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.