Teachers Do More Than Teach: Compressing Image-to-Image Models
- URL: http://arxiv.org/abs/2103.03467v1
- Date: Fri, 5 Mar 2021 04:29:34 GMT
- Title: Teachers Do More Than Teach: Compressing Image-to-Image Models
- Authors: Qing Jin, Jian Ren, Oliver J. Woodford, Jiazhuo Wang, Geng Yuan,
Yanzhi Wang, Sergey Tulyakov
- Abstract summary: Generative Adversarial Networks (GANs) have achieved huge success in generating high-fidelity images.
GANs suffer from low efficiency due to tremendous computational cost and bulky memory usage.
Recent efforts on compression GANs show noticeable progress in obtaining smaller generators.
- Score: 35.40756344110666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative Adversarial Networks (GANs) have achieved huge success in
generating high-fidelity images, however, they suffer from low efficiency due
to tremendous computational cost and bulky memory usage. Recent efforts on
compression GANs show noticeable progress in obtaining smaller generators by
sacrificing image quality or involving a time-consuming searching process. In
this work, we aim to address these issues by introducing a teacher network that
provides a search space in which efficient network architectures can be found,
in addition to performing knowledge distillation. First, we revisit the search
space of generative models, introducing an inception-based residual block into
generators. Second, to achieve target computation cost, we propose a one-step
pruning algorithm that searches a student architecture from the teacher model
and substantially reduces searching cost. It requires no l1 sparsity
regularization and its associated hyper-parameters, simplifying the training
procedure. Finally, we propose to distill knowledge through maximizing feature
similarity between teacher and student via an index named Global Kernel
Alignment (GKA). Our compressed networks achieve similar or even better image
fidelity (FID, mIoU) than the original models with much-reduced computational
cost, e.g., MACs. Code will be released at
https://github.com/snap-research/CAT.
Related papers
- Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search [49.81353382211113]
We address the challenge of integrating multi-head self-attention into high resolution representation CNNs efficiently.
We develop a multi-target multi-branch supernet method, which fully utilizes the advantages of high-resolution features.
We present a series of model via Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searched for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers.
arXiv Detail & Related papers (2024-03-15T15:47:54Z) - UGC: Unified GAN Compression for Efficient Image-to-Image Translation [20.3126581529643]
We propose a new learning paradigm, Unified GAN Compression (UGC), with a unified objective to seamlessly prompt the synergy of model-efficient and label-efficient learning.
We formulate a heterogeneous mutual learning scheme to obtain an architecture-flexible, label-efficient and performance-excellent model.
arXiv Detail & Related papers (2023-09-17T15:55:09Z) - AutoDistil: Few-shot Task-agnostic Neural Architecture Search for
Distilling Large Language Models [121.22644352431199]
We use Neural Architecture Search (NAS) to automatically distill several compressed students with variable cost from a large model.
Current works train a single SuperLM consisting of millions ofworks with weight-sharing.
Experiments on GLUE benchmark against state-of-the-art KD and NAS methods demonstrate AutoDistil to outperform leading compression techniques.
arXiv Detail & Related papers (2022-01-29T06:13:04Z) - Online Multi-Granularity Distillation for GAN Compression [17.114017187236836]
Generative Adversarial Networks (GANs) have witnessed prevailing success in yielding outstanding images.
GANs are burdensome to deploy on resource-constrained devices due to ponderous computational costs and hulking memory usage.
We propose a novel online multi-granularity distillation scheme to obtain lightweight GANs.
arXiv Detail & Related papers (2021-08-16T05:49:50Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - Learning with Privileged Information for Efficient Image
Super-Resolution [35.599731963795875]
We introduce in this paper a novel distillation framework, consisting of teacher and student networks, that allows to boost the performance of FSRCNN drastically.
The encoder in the teacher learns the degradation process, subsampling of HR images, using an imitation loss.
The student and the decoder in the teacher, having the same network architecture as FSRCNN, try to reconstruct HR images.
arXiv Detail & Related papers (2020-07-15T07:44:18Z) - Neural Networks Are More Productive Teachers Than Human Raters: Active
Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model [57.41841346459995]
We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner.
We propose an approach that blends mixup and active learning.
arXiv Detail & Related papers (2020-03-31T05:44:55Z) - GAN Compression: Efficient Architectures for Interactive Conditional
GANs [45.012173624111185]
Recent Conditional Generative Adversarial Networks (cGANs) are 1-2 orders of magnitude more compute-intensive than modern recognition CNNs.
We propose a general-purpose compression framework for reducing the inference time and model size of the generator in cGANs.
arXiv Detail & Related papers (2020-03-19T17:59:05Z) - Distilling portable Generative Adversarial Networks for Image
Translation [101.33731583985902]
Traditional network compression methods focus on visually recognition tasks, but never deal with generation tasks.
Inspired by knowledge distillation, a student generator of fewer parameters is trained by inheriting the low-level and high-level information from the original heavy teacher generator.
An adversarial learning process is established to optimize student generator and student discriminator.
arXiv Detail & Related papers (2020-03-07T05:53:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.