Related papers: Playing Lottery Tickets in Style Transfer Models

Playing Lottery Tickets in Style Transfer Models

URL: http://arxiv.org/abs/2203.13802v1
Date: Fri, 25 Mar 2022 17:43:18 GMT
Title: Playing Lottery Tickets in Style Transfer Models
Authors: Meihao Kong, Jing Huo, Wenbin Li, Jing Wu, Yu-Kun Lai, Yang Gao
Abstract summary: Style transfer has achieved great success and attracted a wide range of attention from both academic and industrial communities. However, the dependence on pretty large VGG based autoencoder leads to existing style transfer models having a high parameter complexities. In this work, we perform the first empirical study to verify whether such trainable networks also exist in style transfer models.
Score: 57.55795986289975
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Style transfer has achieved great success and attracted a wide range of attention from both academic and industrial communities due to its flexible application scenarios. However, the dependence on pretty large VGG based autoencoder leads to existing style transfer models have a high parameter complexities which limits the application for resource-constrained devices. Unfortunately, the compression of style transfer model has less been explored. In parallel, study on the lottery ticket hypothesis (LTH) has shown great potential in finding extremely sparse matching subnetworks which can achieve on par or even better performance than original full networks when trained in isolation. In this work, we perform the first empirical study to verify whether such trainable networks also exist in style transfer models. From a wide range of style transfer methods, we choose two of the most popular style transfer models as the main testbeds, i.e., AdaIN and SANet, representing approaches of global and local transformation based style transfer respectively. Through extensive experiments and comprehensive analysis, we draw the following main conclusions. (1) Compared with fixing VGG encoder, style transfer models can benefit more from training the whole network together. (2) Using iterative magnitude pruning, we find the most sparse matching subnetworks at 89.2% in AdaIN and 73.7% in SANet, which suggests that style transfer models can play lottery tickets too. (3) Feature transformation module should also be pruned to get a sparser model without affecting the existence and quality of matching subnetworks. (4) Besides AdaIN and SANet, other models such as LST, MANet, AdaAttN and MCCNet can also play lottert tickets, which shows that LTH can be generalized to various style transfer models.

Related papers

GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching [41.96482857947199]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.<n>LLMs typically come with a substantial model size, which presents significant challenges in deployment and inference.<n>We develop a novel strategy to compress models by strategically combining or merging layers from finetuned model variants.
arXiv Detail & Related papers (2025-06-25T14:24:59Z)
Transferring Features Across Language Models With Model Stitching [61.24716360332365]
We show that affine mappings between residual streams of language models is a cheap way to transfer represented features between models.<n>We find that small and large models learn similar representation spaces, which motivates training expensive components like SAEs on a smaller model and transferring to a larger model at a FLOPs savings.
arXiv Detail & Related papers (2025-06-07T01:03:25Z)
On the Adversarial Transferability of Generalized "Skip Connections" [83.71752155227888]
Skip connection is an essential ingredient for modern deep models to be deeper and more powerful. We find that using more gradients from the skip connections rather than the residual modules during backpropagation allows one to craft adversarial examples with high transferability. We conduct comprehensive transfer attacks against various models including ResNets, Transformers, Inceptions, Neural Architecture Search, and Large Language Models.
arXiv Detail & Related papers (2024-10-11T16:17:47Z)
FISTNet: FusIon of STyle-path generative Networks for Facial Style Transfer [15.308837341075135]
StyleGAN methods have the tendency of overfitting that results in the introduction of artifacts in the facial images. We propose a FusIon of STyles (FIST) network for facial images that leverages pre-trained multipath style transfer networks.
arXiv Detail & Related papers (2023-07-18T07:20:31Z)
Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer [83.1333306079676]
In this paper, we devise a novel Transformer model termed as emphMaster specifically for style transfer. In the proposed model, different Transformer layers share a common group of parameters, which (1) reduces the total number of parameters, (2) leads to more robust training convergence, and (3) is readily to control the degree of stylization. Experiments demonstrate the superiority of Master under both zero-shot and few-shot style transfer settings.
arXiv Detail & Related papers (2023-04-24T04:46:39Z)
A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework. We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature. Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z)
On Optimizing the Communication of Model Parallelism [74.15423270435949]
We study a novel and important communication pattern in large-scale model-parallel deep learning (DL) In cross-mesh resharding, a sharded tensor needs to be sent from a source device mesh to a destination device mesh. We propose two contributions to address cross-mesh resharding: an efficient broadcast-based communication system, and an "overlapping-friendly" pipeline schedule.
arXiv Detail & Related papers (2022-11-10T03:56:48Z)
How Well Do Sparse Imagenet Models Transfer? [75.98123173154605]
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" datasets. In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset. We show that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities.
arXiv Detail & Related papers (2021-11-26T11:58:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.