Related papers: ALAP-AE: As-Lite-as-Possible Auto-Encoder

ALAP-AE: As-Lite-as-Possible Auto-Encoder

URL: http://arxiv.org/abs/2203.10363v1
Date: Sat, 19 Mar 2022 18:03:08 GMT
Title: ALAP-AE: As-Lite-as-Possible Auto-Encoder
Authors: Nisarg A. Shah and Gaurav Bharaj
Abstract summary: We present a novel algorithm to reduce tensor compute required by a conditional image generation autoencoder. We show performance gains for various conditional image generation tasks. We achieve real-time versions of various autoencoders on CPU-only devices while maintaining image quality.
Score: 6.244939945140818
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel algorithm to reduce tensor compute required by a conditional image generation autoencoder and make it as-lite-as-possible, without sacrificing quality of photo-realistic image generation. Our method is device agnostic, and can optimize an autoencoder for a given CPU-only, GPU compute device(s) in about normal time it takes to train an autoencoder on a generic workstation. We achieve this via a two-stage novel strategy where, first, we condense the channel weights, such that, as few as possible channels are used. Then, we prune the nearly zeroed out weight activations, and fine-tune this lite autoencoder. To maintain image quality, fine-tuning is done via student-teacher training, where we reuse the condensed autoencoder as the teacher. We show performance gains for various conditional image generation tasks: segmentation mask to face images, face images to cartoonization, and finally CycleGAN-based model on horse to zebra dataset over multiple compute devices. We perform various ablation studies to justify the claims and design choices, and achieve real-time versions of various autoencoders on CPU-only devices while maintaining image quality, thus enabling at-scale deployment of such autoencoders.

Related papers

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation [30.942443676393584]
Visual tokenization via auto-encoding empowers state-of-the-art image and video generative models by compressing pixels into a latent space. Our work aims to conduct an exploration of scaling in auto-encoders to fill in this blank. We train ViTok on large-scale image and video datasets far exceeding ImageNet-1K, removing data constraints on tokenizer scaling.
arXiv Detail & Related papers (2025-01-16T18:59:04Z)
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding [60.188309982690335]
We propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD) SJD accelerates the inference of auto-regressive text-to-image generation while maintaining the randomness in sampling-based token decoding. Specifically, SJD facilitates the model to predict multiple tokens at each step and accepts tokens based on the probabilistic criterion.
arXiv Detail & Related papers (2024-10-02T16:05:27Z)
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders [89.12558126877532]
We propose CropMAE, an alternative approach to the Siamese pre-training introduced by SiamMAE. Our method exclusively considers pairs of cropped images sourced from the same image but cropped differently, deviating from the conventional pairs of frames extracted from a video. CropMAE achieves the highest masking ratio to date (98.5%), enabling the reconstruction of images using only two visible patches.
arXiv Detail & Related papers (2024-03-26T16:04:19Z)
An Efficient Implicit Neural Representation Image Codec Based on Mixed Autoregressive Model for Low-Complexity Decoding [43.43996899487615]
Implicit Neural Representation (INR) for image compression is an emerging technology that offers two key benefits compared to cutting-edge autoencoder models. We introduce a new Mixed AutoRegressive Model (MARM) to significantly reduce the decoding time for the current INR. MARM includes our proposed AutoRegressive Upsampler (ARU) blocks, which are highly efficient, and ARM from previous work to balance decoding time and reconstruction quality.
arXiv Detail & Related papers (2024-01-23T09:37:58Z)
Collaborative Auto-encoding for Blind Image Quality Assessment [17.081262827258943]
Blind image quality assessment (BIQA) is a challenging problem with important real-world applications. Recent efforts attempting to exploit powerful representations by deep neural networks (DNN) are hindered by the lack of subjectively annotated data. This paper presents a novel BIQA method which overcomes this fundamental obstacle.
arXiv Detail & Related papers (2023-05-24T03:45:03Z)
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale [59.01246141215051]
We analyze the factor that leads to degradation from the perspective of language supervision. We propose a tunable-free pre-training strategy to retain the generalization ability of the text encoder. We produce a series of models, dubbed TVTSv2, with up to one billion parameters.
arXiv Detail & Related papers (2023-05-23T15:44:56Z)
Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm. We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z)
Frozen CLIP Models are Efficient Video Learners [86.73871814176795]
Video recognition has been dominated by the end-to-end learning paradigm. Recent advances in Contrastive Vision-Language Pre-training pave the way for a new route for visual recognition tasks. We present Efficient Video Learning -- an efficient framework for directly training high-quality video recognition models.
arXiv Detail & Related papers (2022-08-06T17:38:25Z)
Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z)
Lookahead optimizer improves the performance of Convolutional Autoencoders for reconstruction of natural images [0.0]
Autoencoders are a class of artificial neural networks which have gained a lot of attention in the recent past. Lookahead (with Adam) improves the performance of CAEs for reconstruction of natural images. We show that lookahead (with Adam) improves the performance of CAEs for reconstruction of natural images.
arXiv Detail & Related papers (2020-12-03T03:18:28Z)
VCE: Variational Convertor-Encoder for One-Shot Generalization [3.86981854389977]
Variational Convertor-Encoder (VCE) converts an image to various styles. We present this novel architecture for the problem of one-shot generalization. We also improve the performance of variational auto-encoder (VAE) to filter those blurred points.
arXiv Detail & Related papers (2020-11-12T07:58:14Z)
Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames. We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs. We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.