MicroAST: Towards Super-Fast Ultra-Resolution Arbitrary Style Transfer
- URL: http://arxiv.org/abs/2211.15313v1
- Date: Mon, 28 Nov 2022 13:49:26 GMT
- Title: MicroAST: Towards Super-Fast Ultra-Resolution Arbitrary Style Transfer
- Authors: Zhizhong Wang, Lei Zhao, Zhiwen Zuo, Ailin Li, Haibo Chen, Wei Xing,
Dongming Lu
- Abstract summary: Arbitrary style transfer (AST) transfers arbitrary artistic styles onto content images.
Existing AST methods are incapable or too slow to run at ultra-resolutions.
We learn a straightforward and lightweight model, dubbed MicroAST.
- Score: 17.3797025528892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Arbitrary style transfer (AST) transfers arbitrary artistic styles onto
content images. Despite the recent rapid progress, existing AST methods are
either incapable or too slow to run at ultra-resolutions (e.g., 4K) with
limited resources, which heavily hinders their further applications. In this
paper, we tackle this dilemma by learning a straightforward and lightweight
model, dubbed MicroAST. The key insight is to completely abandon the use of
cumbersome pre-trained Deep Convolutional Neural Networks (e.g., VGG) at
inference. Instead, we design two micro encoders (content and style encoders)
and one micro decoder for style transfer. The content encoder aims at
extracting the main structure of the content image. The style encoder, coupled
with a modulator, encodes the style image into learnable dual-modulation
signals that modulate both intermediate features and convolutional filters of
the decoder, thus injecting more sophisticated and flexible style signals to
guide the stylizations. In addition, to boost the ability of the style encoder
to extract more distinct and representative style signals, we also introduce a
new style signal contrastive loss in our model. Compared to the state of the
art, our MicroAST not only produces visually superior results but also is 5-73
times smaller and 6-18 times faster, for the first time enabling super-fast
(about 0.5 seconds) AST at 4K ultra-resolutions. Code is available at
https://github.com/EndyWon/MicroAST.
Related papers
- When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models [11.401299303276016]
We introduce FreeStyle, an innovative style transfer method built upon a pre-trained large diffusion model.
Our method enables style transfer only through a text description of the desired style, eliminating the necessity of style images.
Our experimental results demonstrate high-quality synthesis and fidelity of our method across various content images and style text prompts.
arXiv Detail & Related papers (2024-01-28T12:00:31Z) - Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features.
We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps.
We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z) - Latent-Shift: Latent Diffusion with Temporal Shift for Efficient
Text-to-Video Generation [115.09597127418452]
Latent-Shift is an efficient text-to-video generation method based on a pretrained text-to-image generation model.
We show that Latent-Shift achieves comparable or better results while being significantly more efficient.
arXiv Detail & Related papers (2023-04-17T17:57:06Z) - Video Coding Using Learned Latent GAN Compression [1.6058099298620423]
We leverage the generative capacity of GANs such as StyleGAN to represent and compress a video.
Each frame is inverted in the latent space of StyleGAN, from which the optimal compression is learned.
arXiv Detail & Related papers (2022-07-09T19:07:43Z) - Feature-Style Encoder for Style-Based GAN Inversion [1.9116784879310027]
We propose a novel architecture for GAN inversion, which we call Feature-Style encoder.
Our model achieves accurate inversion of real images from the latent space of a pre-trained style-based GAN model.
Thanks to its encoder structure, the model allows fast and accurate image editing.
arXiv Detail & Related papers (2022-02-04T15:19:34Z) - Transformer-based Image Compression [18.976159633970177]
Transformer-based Image Compression (TIC) approach is developed which reuses the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders.
TIC rivals with state-of-the-art approaches including deep convolutional neural networks (CNNs) based learnt image coding (LIC) methods and handcrafted rules-based intra profile of recently-approved Versatile Video Coding (VVC) standard.
arXiv Detail & Related papers (2021-11-12T13:13:20Z) - Fine-grained style control in Transformer-based Text-to-speech Synthesis [78.92428622630861]
We present a novel architecture to realize fine-grained style control on the Transformer-based text-to-speech synthesis (TransformerTTS)
We model the speaking style by extracting a time sequence of local style tokens (LST) from the reference speech.
Experiments show that with fine-grained style control, our system performs better in terms of naturalness, intelligibility, and style transferability.
arXiv Detail & Related papers (2021-10-12T19:50:02Z) - Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues.
We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z) - Real-time Universal Style Transfer on High-resolution Images via
Zero-channel Pruning [74.09149955786367]
ArtNet can achieve universal, real-time, and high-quality style transfer on high-resolution images simultaneously.
By using ArtNet and S2, our method is 2.3 to 107.4 times faster than state-of-the-art approaches.
arXiv Detail & Related papers (2020-06-16T09:50:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.