Lookahead optimizer improves the performance of Convolutional
Autoencoders for reconstruction of natural images
- URL: http://arxiv.org/abs/2012.05694v1
- Date: Thu, 3 Dec 2020 03:18:28 GMT
- Title: Lookahead optimizer improves the performance of Convolutional
Autoencoders for reconstruction of natural images
- Authors: Sayan Nag
- Abstract summary: Autoencoders are a class of artificial neural networks which have gained a lot of attention in the recent past.
Lookahead (with Adam) improves the performance of CAEs for reconstruction of natural images.
We show that lookahead (with Adam) improves the performance of CAEs for reconstruction of natural images.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autoencoders are a class of artificial neural networks which have gained a
lot of attention in the recent past. Using the encoder block of an autoencoder
the input image can be compressed into a meaningful representation. Then a
decoder is employed to reconstruct the compressed representation back to a
version which looks like the input image. It has plenty of applications in the
field of data compression and denoising. Another version of Autoencoders (AE)
exist, called Variational AE (VAE) which acts as a generative model like GAN.
Recently, an optimizer was introduced which is known as lookahead optimizer
which significantly enhances the performances of Adam as well as SGD. In this
paper, we implement Convolutional Autoencoders (CAE) and Convolutional
Variational Autoencoders (CVAE) with lookahead optimizer (with Adam) and
compare them with the Adam (only) optimizer counterparts. For this purpose, we
have used a movie dataset comprising of natural images for the former case and
CIFAR100 for the latter case. We show that lookahead optimizer (with Adam)
improves the performance of CAEs for reconstruction of natural images.
Related papers
- Learnings from Scaling Visual Tokenizers for Reconstruction and Generation [30.942443676393584]
Visual tokenization via auto-encoding empowers state-of-the-art image and video generative models by compressing pixels into a latent space.
Our work aims to conduct an exploration of scaling in auto-encoders to fill in this blank.
We train ViTok on large-scale image and video datasets far exceeding ImageNet-1K, removing data constraints on tokenizer scaling.
arXiv Detail & Related papers (2025-01-16T18:59:04Z) - $ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space.
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder.
arXiv Detail & Related papers (2024-10-05T08:27:53Z) - ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models [77.59651787115546]
High-resolution Large Multimodal Models (LMMs) encounter the challenges of excessive visual tokens and quadratic visual complexity.
We propose ConvLLaVA, which employs ConvNeXt, a hierarchical backbone, as the visual encoder of LMM.
ConvLLaVA compresses high-resolution images into information-rich visual features, effectively preventing the generation of excessive visual tokens.
arXiv Detail & Related papers (2024-05-24T17:34:15Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - EVC: Towards Real-Time Neural Image Compression with Mask Decay [29.76392801329279]
Neural image compression has surpassed state-of-the-art traditional codecs (H.266/VVC) for rate-distortion (RD) performance.
We propose an Efficient single-model Variable-bit-rate Codec (EVC) which is able to run at 30 FPS with 768x512 input images and still outperforms VVC for the RD performance.
arXiv Detail & Related papers (2023-02-10T06:02:29Z) - Denoising Masked AutoEncoders are Certifiable Robust Vision Learners [37.04863068273281]
We propose a new self-supervised method, which is called Denoising Masked AutoEncoders (DMAE)
DMAE corrupts each image by adding Gaussian noises to each pixel value and randomly masking several patches.
A Transformer-based encoder-decoder model is then trained to reconstruct the original image from the corrupted one.
arXiv Detail & Related papers (2022-10-10T12:37:59Z) - ALAP-AE: As-Lite-as-Possible Auto-Encoder [6.244939945140818]
We present a novel algorithm to reduce tensor compute required by a conditional image generation autoencoder.
We show performance gains for various conditional image generation tasks.
We achieve real-time versions of various autoencoders on CPU-only devices while maintaining image quality.
arXiv Detail & Related papers (2022-03-19T18:03:08Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z) - VCE: Variational Convertor-Encoder for One-Shot Generalization [3.86981854389977]
Variational Convertor-Encoder (VCE) converts an image to various styles.
We present this novel architecture for the problem of one-shot generalization.
We also improve the performance of variational auto-encoder (VAE) to filter those blurred points.
arXiv Detail & Related papers (2020-11-12T07:58:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.