PhotoWCT$^2$: Compact Autoencoder for Photorealistic Style Transfer
Resulting from Blockwise Training and Skip Connections of High-Frequency
Residuals
- URL: http://arxiv.org/abs/2110.11995v1
- Date: Fri, 22 Oct 2021 18:20:41 GMT
- Title: PhotoWCT$^2$: Compact Autoencoder for Photorealistic Style Transfer
Resulting from Blockwise Training and Skip Connections of High-Frequency
Residuals
- Authors: Tai-Yin Chiu, Danna Gurari
- Abstract summary: Photorealistic style transfer is an image editing task with the goal to modify an image to match the style of another image while ensuring the result looks like a real photograph.
A limitation of existing models is that they have many parameters, which in turn prevents their use for larger image resolutions and leads to slower run-times.
We introduce two mechanisms that enable our design of a more compact model that preserves state-of-art stylization strength and photorealism.
- Score: 35.64625206673256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Photorealistic style transfer is an image editing task with the goal to
modify an image to match the style of another image while ensuring the result
looks like a real photograph. A limitation of existing models is that they have
many parameters, which in turn prevents their use for larger image resolutions
and leads to slower run-times. We introduce two mechanisms that enable our
design of a more compact model that we call PhotoWCT$^2$, which preserves
state-of-art stylization strength and photorealism. First, we introduce
blockwise training to perform coarse-to-fine feature transformations that
enable state-of-art stylization strength in a single autoencoder in place of
the inefficient cascade of four autoencoders used in PhotoWCT. Second, we
introduce skip connections of high-frequency residuals in order to preserve
image quality when applying the sequential coarse-to-fine feature
transformations. Our PhotoWCT$^2$ model requires fewer parameters (e.g., 30.3\%
fewer) while supporting higher resolution images (e.g., 4K) and achieving
faster stylization than existing models.
Related papers
- Compact Latent Representation for Image Compression (CLRIC) [16.428925911432344]
Current image compression models often require separate models for each quality level, making them resource-intensive in terms of both training and storage.
We propose an innovative approach that utilizes latent variables from pre-existing trained models for perceptual image compression.
Our method achieves comparable perceptual quality to state-of-the-art learned image compression models while being both model-agnostic and resolution-agnostic.
arXiv Detail & Related papers (2025-02-20T13:20:56Z) - SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training [77.681908636429]
Text-to-image (T2I) models face several limitations, including large model sizes, slow, and low-quality generation on mobile devices.
This paper aims to develop an extremely small and fast T2I model that generates high-resolution and high-quality images on mobile platforms.
arXiv Detail & Related papers (2024-12-12T18:59:53Z) - TSFormer: A Robust Framework for Efficient UHD Image Restoration [7.487270862599671]
TSFormer is an all-in-one framework that integrates textbfTrusted learning with textbfSparsification.
Our model can run a 4K image in real time (40fps) with 3.38 M parameters.
arXiv Detail & Related papers (2024-11-17T03:34:27Z) - Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation [81.45400849638347]
In-image machine translation (IIMT) aims to translate an image containing texts in source language into an image containing translations in target language.
In this paper, we propose an end-to-end IIMT model consisting of four modules.
Our model achieves competitive performance compared to cascaded models with only 70.9% of parameters, and significantly outperforms the pixel-level end-to-end IIMT model.
arXiv Detail & Related papers (2024-07-03T08:15:39Z) - Image-GS: Content-Adaptive Image Representation via 2D Gaussians [52.598772767324036]
We introduce Image-GS, a content-adaptive image representation based on 2D Gaussians radiance.<n>It supports hardware-friendly rapid access for real-time usage, requiring only 0.3K MACs to decode a pixel.<n>We demonstrate its versatility with several applications, including texture compression, semantics-aware compression, and joint image compression and restoration.
arXiv Detail & Related papers (2024-07-02T00:45:21Z) - Improving Text-to-Image Consistency via Automatic Prompt Optimization [26.2587505265501]
We introduce a T2I optimization-by-prompting framework, OPT2I, to improve prompt-image consistency in T2I models.
Our framework starts from a user prompt and iteratively generates revised prompts with the goal of maximizing a consistency score.
arXiv Detail & Related papers (2024-03-26T15:42:01Z) - Direct Consistency Optimization for Compositional Text-to-Image
Personalization [73.94505688626651]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency.
We propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the deviation from the pretrained model.
arXiv Detail & Related papers (2024-02-19T09:52:41Z) - Controllable Image Enhancement [66.18525728881711]
We present a semiautomatic image enhancement algorithm that can generate high-quality images with multiple styles by controlling a few parameters.
An encoder-decoder framework encodes the retouching skills into latent codes and decodes them into the parameters of image signal processing functions.
arXiv Detail & Related papers (2022-06-16T23:54:53Z) - Spatial-Separated Curve Rendering Network for Efficient and
High-Resolution Image Harmonization [59.19214040221055]
We propose a novel spatial-separated curve rendering network (S$2$CRNet) for efficient and high-resolution image harmonization.
The proposed method reduces more than 90% parameters compared with previous methods.
Our method can work smoothly on higher resolution images in real-time which is more than 10$times$ faster than the existing methods.
arXiv Detail & Related papers (2021-09-13T07:20:16Z) - Towards Unsupervised Deep Image Enhancement with Generative Adversarial
Network [92.01145655155374]
We present an unsupervised image enhancement generative network (UEGAN)
It learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner.
Results show that the proposed model effectively improves the aesthetic quality of images.
arXiv Detail & Related papers (2020-12-30T03:22:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.