Related papers: Teachers Do More Than Teach: Compressing Image-to-Image Models

Teachers Do More Than Teach: Compressing Image-to-Image Models

URL: http://arxiv.org/abs/2103.03467v1
Date: Fri, 5 Mar 2021 04:29:34 GMT
Title: Teachers Do More Than Teach: Compressing Image-to-Image Models
Authors: Qing Jin, Jian Ren, Oliver J. Woodford, Jiazhuo Wang, Geng Yuan, Yanzhi Wang, Sergey Tulyakov
Abstract summary: Generative Adversarial Networks (GANs) have achieved huge success in generating high-fidelity images. GANs suffer from low efficiency due to tremendous computational cost and bulky memory usage. Recent efforts on compression GANs show noticeable progress in obtaining smaller generators.
Score: 35.40756344110666
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative Adversarial Networks (GANs) have achieved huge success in generating high-fidelity images, however, they suffer from low efficiency due to tremendous computational cost and bulky memory usage. Recent efforts on compression GANs show noticeable progress in obtaining smaller generators by sacrificing image quality or involving a time-consuming searching process. In this work, we aim to address these issues by introducing a teacher network that provides a search space in which efficient network architectures can be found, in addition to performing knowledge distillation. First, we revisit the search space of generative models, introducing an inception-based residual block into generators. Second, to achieve target computation cost, we propose a one-step pruning algorithm that searches a student architecture from the teacher model and substantially reduces searching cost. It requires no l1 sparsity regularization and its associated hyper-parameters, simplifying the training procedure. Finally, we propose to distill knowledge through maximizing feature similarity between teacher and student via an index named Global Kernel Alignment (GKA). Our compressed networks achieve similar or even better image fidelity (FID, mIoU) than the original models with much-reduced computational cost, e.g., MACs. Code will be released at https://github.com/snap-research/CAT.

Related papers

Striving for Faster and Better: A One-Layer Architecture with Auto Re-parameterization for Low-Light Image Enhancement [50.93686436282772]
We aim to delve into the limits of image enhancers both from visual quality and computational efficiency. By rethinking the task demands, we build an explicit connection, i.e., visual quality and computational efficiency are corresponding to model learning and structure design. Ultimately, this achieves efficient low-light image enhancement using only a single convolutional layer, while maintaining excellent visual quality.
arXiv Detail & Related papers (2025-02-27T08:20:03Z)
Soft Knowledge Distillation with Multi-Dimensional Cross-Net Attention for Image Restoration Models Compression [0.0]
Transformer-based encoder-decoder models have achieved remarkable success in image-to-image transfer tasks. However, their high computational complexity-manifested in elevated FLOPs and parameter counts-limits their application in real-world scenarios. We propose a Soft Knowledge Distillation (SKD) strategy that incorporates a Multi-dimensional Cross-net Attention (MCA) mechanism for compressing image restoration models.
arXiv Detail & Related papers (2025-01-16T06:25:56Z)
Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search [49.81353382211113]
We address the challenge of integrating multi-head self-attention into high resolution representation CNNs efficiently. We develop a multi-target multi-branch supernet method, which fully utilizes the advantages of high-resolution features. We present a series of model via Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searched for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers.
arXiv Detail & Related papers (2024-03-15T15:47:54Z)
UGC: Unified GAN Compression for Efficient Image-to-Image Translation [20.3126581529643]
We propose a new learning paradigm, Unified GAN Compression (UGC), with a unified objective to seamlessly prompt the synergy of model-efficient and label-efficient learning. We formulate a heterogeneous mutual learning scheme to obtain an architecture-flexible, label-efficient and performance-excellent model.
arXiv Detail & Related papers (2023-09-17T15:55:09Z)
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models [121.22644352431199]
We use Neural Architecture Search (NAS) to automatically distill several compressed students with variable cost from a large model. Current works train a single SuperLM consisting of millions ofworks with weight-sharing. Experiments on GLUE benchmark against state-of-the-art KD and NAS methods demonstrate AutoDistil to outperform leading compression techniques.
arXiv Detail & Related papers (2022-01-29T06:13:04Z)
Online Multi-Granularity Distillation for GAN Compression [17.114017187236836]
Generative Adversarial Networks (GANs) have witnessed prevailing success in yielding outstanding images. GANs are burdensome to deploy on resource-constrained devices due to ponderous computational costs and hulking memory usage. We propose a novel online multi-granularity distillation scheme to obtain lightweight GANs.
arXiv Detail & Related papers (2021-08-16T05:49:50Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
Learning with Privileged Information for Efficient Image Super-Resolution [35.599731963795875]
We introduce in this paper a novel distillation framework, consisting of teacher and student networks, that allows to boost the performance of FSRCNN drastically. The encoder in the teacher learns the degradation process, subsampling of HR images, using an imitation loss. The student and the decoder in the teacher, having the same network architecture as FSRCNN, try to reconstruct HR images.
arXiv Detail & Related papers (2020-07-15T07:44:18Z)
Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model [57.41841346459995]
We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner. We propose an approach that blends mixup and active learning.
arXiv Detail & Related papers (2020-03-31T05:44:55Z)
GAN Compression: Efficient Architectures for Interactive Conditional GANs [45.012173624111185]
Recent Conditional Generative Adversarial Networks (cGANs) are 1-2 orders of magnitude more compute-intensive than modern recognition CNNs. We propose a general-purpose compression framework for reducing the inference time and model size of the generator in cGANs.
arXiv Detail & Related papers (2020-03-19T17:59:05Z)
Distilling portable Generative Adversarial Networks for Image Translation [101.33731583985902]
Traditional network compression methods focus on visually recognition tasks, but never deal with generation tasks. Inspired by knowledge distillation, a student generator of fewer parameters is trained by inheriting the low-level and high-level information from the original heavy teacher generator. An adversarial learning process is established to optimize student generator and student discriminator.
arXiv Detail & Related papers (2020-03-07T05:53:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.