ALAP-AE: As-Lite-as-Possible Auto-Encoder
- URL: http://arxiv.org/abs/2203.10363v1
- Date: Sat, 19 Mar 2022 18:03:08 GMT
- Title: ALAP-AE: As-Lite-as-Possible Auto-Encoder
- Authors: Nisarg A. Shah and Gaurav Bharaj
- Abstract summary: We present a novel algorithm to reduce tensor compute required by a conditional image generation autoencoder.
We show performance gains for various conditional image generation tasks.
We achieve real-time versions of various autoencoders on CPU-only devices while maintaining image quality.
- Score: 6.244939945140818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel algorithm to reduce tensor compute required by a
conditional image generation autoencoder and make it as-lite-as-possible,
without sacrificing quality of photo-realistic image generation. Our method is
device agnostic, and can optimize an autoencoder for a given CPU-only, GPU
compute device(s) in about normal time it takes to train an autoencoder on a
generic workstation. We achieve this via a two-stage novel strategy where,
first, we condense the channel weights, such that, as few as possible channels
are used. Then, we prune the nearly zeroed out weight activations, and
fine-tune this lite autoencoder. To maintain image quality, fine-tuning is done
via student-teacher training, where we reuse the condensed autoencoder as the
teacher. We show performance gains for various conditional image generation
tasks: segmentation mask to face images, face images to cartoonization, and
finally CycleGAN-based model on horse to zebra dataset over multiple compute
devices. We perform various ablation studies to justify the claims and design
choices, and achieve real-time versions of various autoencoders on CPU-only
devices while maintaining image quality, thus enabling at-scale deployment of
such autoencoders.
Related papers
- RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression [68.31184784672227]
In modern applications such as autonomous driving, an overwhelming majority of videos serve as input for AI systems performing tasks.
It is therefore useful to optimize the encoder for a downstream task instead of for image quality.
Here, we address this challenge by controlling the Quantization Parameters (QPs) at the macro-block level to optimize the downstream task.
arXiv Detail & Related papers (2025-01-21T15:36:08Z) - Learnings from Scaling Visual Tokenizers for Reconstruction and Generation [30.942443676393584]
Visual tokenization via auto-encoding empowers state-of-the-art image and video generative models by compressing pixels into a latent space.
Our work aims to conduct an exploration of scaling in auto-encoders to fill in this blank.
We train ViTok on large-scale image and video datasets far exceeding ImageNet-1K, removing data constraints on tokenizer scaling.
arXiv Detail & Related papers (2025-01-16T18:59:04Z) - Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders [89.12558126877532]
We propose CropMAE, an alternative approach to the Siamese pre-training introduced by SiamMAE.
Our method exclusively considers pairs of cropped images sourced from the same image but cropped differently, deviating from the conventional pairs of frames extracted from a video.
CropMAE achieves the highest masking ratio to date (98.5%), enabling the reconstruction of images using only two visible patches.
arXiv Detail & Related papers (2024-03-26T16:04:19Z) - An Efficient Implicit Neural Representation Image Codec Based on Mixed Autoregressive Model for Low-Complexity Decoding [43.43996899487615]
Implicit Neural Representation (INR) for image compression is an emerging technology that offers two key benefits compared to cutting-edge autoencoder models.
We introduce a new Mixed AutoRegressive Model (MARM) to significantly reduce the decoding time for the current INR.
MARM includes our proposed AutoRegressive Upsampler (ARU) blocks, which are highly efficient, and ARM from previous work to balance decoding time and reconstruction quality.
arXiv Detail & Related papers (2024-01-23T09:37:58Z) - TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at
Scale [59.01246141215051]
We analyze the factor that leads to degradation from the perspective of language supervision.
We propose a tunable-free pre-training strategy to retain the generalization ability of the text encoder.
We produce a series of models, dubbed TVTSv2, with up to one billion parameters.
arXiv Detail & Related papers (2023-05-23T15:44:56Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - Frozen CLIP Models are Efficient Video Learners [86.73871814176795]
Video recognition has been dominated by the end-to-end learning paradigm.
Recent advances in Contrastive Vision-Language Pre-training pave the way for a new route for visual recognition tasks.
We present Efficient Video Learning -- an efficient framework for directly training high-quality video recognition models.
arXiv Detail & Related papers (2022-08-06T17:38:25Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z) - Lookahead optimizer improves the performance of Convolutional
Autoencoders for reconstruction of natural images [0.0]
Autoencoders are a class of artificial neural networks which have gained a lot of attention in the recent past.
Lookahead (with Adam) improves the performance of CAEs for reconstruction of natural images.
We show that lookahead (with Adam) improves the performance of CAEs for reconstruction of natural images.
arXiv Detail & Related papers (2020-12-03T03:18:28Z) - VCE: Variational Convertor-Encoder for One-Shot Generalization [3.86981854389977]
Variational Convertor-Encoder (VCE) converts an image to various styles.
We present this novel architecture for the problem of one-shot generalization.
We also improve the performance of variational auto-encoder (VAE) to filter those blurred points.
arXiv Detail & Related papers (2020-11-12T07:58:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.