FIANCEE: Faster Inference of Adversarial Networks via Conditional Early
Exits
- URL: http://arxiv.org/abs/2304.10306v2
- Date: Thu, 7 Dec 2023 02:43:03 GMT
- Title: FIANCEE: Faster Inference of Adversarial Networks via Conditional Early
Exits
- Authors: Polina Karpikova, Radionova Ekaterina, Anastasia Yaschenko, Andrei
Spiridonov, Leonid Kostyushko, Riccardo Fabbricatore, Aleksei Ivakhnenko
- Abstract summary: We propose a method for diminishing computations by adding so-called early exit branches to the original architecture.
We apply our method on two different SOTA models performing generative tasks.
This is especially relevant for real-time applications such as synthesis of faces, when quality loss needs to be contained.
- Score: 0.7649605697963953
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Generative DNNs are a powerful tool for image synthesis, but they are limited
by their computational load. On the other hand, given a trained model and a
task, e.g. faces generation within a range of characteristics, the output image
quality will be unevenly distributed among images with different
characteristics. It follows, that we might restrain the models complexity on
some instances, maintaining a high quality. We propose a method for diminishing
computations by adding so-called early exit branches to the original
architecture, and dynamically switching the computational path depending on how
difficult it will be to render the output. We apply our method on two different
SOTA models performing generative tasks: generation from a semantic map, and
cross-reenactment of face expressions; showing it is able to output images with
custom lower-quality thresholds. For a threshold of LPIPS <=0.1, we diminish
their computations by up to a half. This is especially relevant for real-time
applications such as synthesis of faces, when quality loss needs to be
contained, but most of the inputs need fewer computations than the complex
instances.
Related papers
- Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization [27.97760974010369]
We show an approach to reduce the effect of compression on a task loss using the distance between features as a distortion metric.
We simplify the RDO formulation to make the distortion term computable using block-based encoders.
We show up to 10% bit-rate savings for the same computer vision accuracy compared to RDO based on SSE.
arXiv Detail & Related papers (2025-04-03T02:11:26Z) - Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
Diffusion models have dominated the field of large, generative image models.
We propose an algorithm for fast-constrained sampling in large pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - Multi-Feature Aggregation in Diffusion Models for Enhanced Face Super-Resolution [6.055006354743854]
We develop an algorithm that utilize a low-resolution image combined with features extracted from multiple low-quality images to generate a super-resolved image.
Unlike other algorithms, our approach recovers facial features without explicitly providing attribute information.
This is the first time multi-features combined with low-resolution images are used as conditioners to generate more reliable super-resolution images.
arXiv Detail & Related papers (2024-08-27T20:08:33Z) - Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models [22.702352459581434]
Serpent is an efficient architecture for high-resolution image restoration.
We show that Serpent can achieve reconstruction quality on par with state-of-the-art techniques.
arXiv Detail & Related papers (2024-03-26T17:43:15Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting [2.3014300466616078]
This paper diverges from vision transformers by using a computationally-efficient WaveMix-based fully convolutional architecture -- WavePaint.
It uses a 2D-discrete wavelet transform (DWT) for spatial and multi-resolution token-mixing along with convolutional layers.
Our model even outperforms current GAN-based architectures in CelebA-HQ dataset without using an adversarially trainable discriminator.
arXiv Detail & Related papers (2023-07-01T18:41:34Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly.
A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work.
It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z) - ClusTR: Exploring Efficient Self-attention via Clustering for Vision
Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention.
Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count.
The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.