Related papers: PhotoWCT$^2$: Compact Autoencoder for Photorealistic Style Transfer Resulting from Blockwise Training and Skip Connections of High-Frequency Residuals

PhotoWCT$^2$: Compact Autoencoder for Photorealistic Style Transfer Resulting from Blockwise Training and Skip Connections of High-Frequency Residuals

URL: http://arxiv.org/abs/2110.11995v1
Date: Fri, 22 Oct 2021 18:20:41 GMT
Title: PhotoWCT$^2$: Compact Autoencoder for Photorealistic Style Transfer Resulting from Blockwise Training and Skip Connections of High-Frequency Residuals
Authors: Tai-Yin Chiu, Danna Gurari
Abstract summary: Photorealistic style transfer is an image editing task with the goal to modify an image to match the style of another image while ensuring the result looks like a real photograph. A limitation of existing models is that they have many parameters, which in turn prevents their use for larger image resolutions and leads to slower run-times. We introduce two mechanisms that enable our design of a more compact model that preserves state-of-art stylization strength and photorealism.
Score: 35.64625206673256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Photorealistic style transfer is an image editing task with the goal to modify an image to match the style of another image while ensuring the result looks like a real photograph. A limitation of existing models is that they have many parameters, which in turn prevents their use for larger image resolutions and leads to slower run-times. We introduce two mechanisms that enable our design of a more compact model that we call PhotoWCT$^2$, which preserves state-of-art stylization strength and photorealism. First, we introduce blockwise training to perform coarse-to-fine feature transformations that enable state-of-art stylization strength in a single autoencoder in place of the inefficient cascade of four autoencoders used in PhotoWCT. Second, we introduce skip connections of high-frequency residuals in order to preserve image quality when applying the sequential coarse-to-fine feature transformations. Our PhotoWCT$^2$ model requires fewer parameters (e.g., 30.3\% fewer) while supporting higher resolution images (e.g., 4K) and achieving faster stylization than existing models.

Related papers

Compact Latent Representation for Image Compression (CLRIC) [16.428925911432344]
Current image compression models often require separate models for each quality level, making them resource-intensive in terms of both training and storage. We propose an innovative approach that utilizes latent variables from pre-existing trained models for perceptual image compression. Our method achieves comparable perceptual quality to state-of-the-art learned image compression models while being both model-agnostic and resolution-agnostic.
arXiv Detail & Related papers (2025-02-20T13:20:56Z)
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training [77.681908636429]
Text-to-image (T2I) models face several limitations, including large model sizes, slow, and low-quality generation on mobile devices. This paper aims to develop an extremely small and fast T2I model that generates high-resolution and high-quality images on mobile platforms.
arXiv Detail & Related papers (2024-12-12T18:59:53Z)
TSFormer: A Robust Framework for Efficient UHD Image Restoration [7.487270862599671]
TSFormer is an all-in-one framework that integrates textbfTrusted learning with textbfSparsification. Our model can run a 4K image in real time (40fps) with 3.38 M parameters.
arXiv Detail & Related papers (2024-11-17T03:34:27Z)
Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation [81.45400849638347]
In-image machine translation (IIMT) aims to translate an image containing texts in source language into an image containing translations in target language. In this paper, we propose an end-to-end IIMT model consisting of four modules. Our model achieves competitive performance compared to cascaded models with only 70.9% of parameters, and significantly outperforms the pixel-level end-to-end IIMT model.
arXiv Detail & Related papers (2024-07-03T08:15:39Z)
Image-GS: Content-Adaptive Image Representation via 2D Gaussians [52.598772767324036]
We introduce Image-GS, a content-adaptive image representation based on 2D Gaussians radiance.<n>It supports hardware-friendly rapid access for real-time usage, requiring only 0.3K MACs to decode a pixel.<n>We demonstrate its versatility with several applications, including texture compression, semantics-aware compression, and joint image compression and restoration.
arXiv Detail & Related papers (2024-07-02T00:45:21Z)
Improving Text-to-Image Consistency via Automatic Prompt Optimization [26.2587505265501]
We introduce a T2I optimization-by-prompting framework, OPT2I, to improve prompt-image consistency in T2I models. Our framework starts from a user prompt and iteratively generates revised prompts with the goal of maximizing a consistency score.
arXiv Detail & Related papers (2024-03-26T15:42:01Z)
Direct Consistency Optimization for Compositional Text-to-Image Personalization [73.94505688626651]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency. We propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the deviation from the pretrained model.
arXiv Detail & Related papers (2024-02-19T09:52:41Z)
Controllable Image Enhancement [66.18525728881711]
We present a semiautomatic image enhancement algorithm that can generate high-quality images with multiple styles by controlling a few parameters. An encoder-decoder framework encodes the retouching skills into latent codes and decodes them into the parameters of image signal processing functions.
arXiv Detail & Related papers (2022-06-16T23:54:53Z)
Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization [59.19214040221055]
We propose a novel spatial-separated curve rendering network (S$2$CRNet) for efficient and high-resolution image harmonization. The proposed method reduces more than 90% parameters compared with previous methods. Our method can work smoothly on higher resolution images in real-time which is more than 10$times$ faster than the existing methods.
arXiv Detail & Related papers (2021-09-13T07:20:16Z)
Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network [92.01145655155374]
We present an unsupervised image enhancement generative network (UEGAN) It learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner. Results show that the proposed model effectively improves the aesthetic quality of images.
arXiv Detail & Related papers (2020-12-30T03:22:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.