Related papers: Optimizing Hierarchical Image VAEs for Sample Quality

Optimizing Hierarchical Image VAEs for Sample Quality

URL: http://arxiv.org/abs/2210.10205v1
Date: Tue, 18 Oct 2022 23:10:58 GMT
Title: Optimizing Hierarchical Image VAEs for Sample Quality
Authors: Eric Luhman, Troy Luhman
Abstract summary: hierarchical variational autoencoders (VAEs) have achieved great density estimation on image modeling tasks. We attribute this to learned representations that over-emphasize compressing imperceptible details of the image. We introduce a KL-reweighting strategy to control the amount of infor mation in each latent group, and employ a Gaussian output layer to reduce sharpness in the learning objective.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While hierarchical variational autoencoders (VAEs) have achieved great density estimation on image modeling tasks, samples from their prior tend to look less convincing than models with similar log-likelihood. We attribute this to learned representations that over-emphasize compressing imperceptible details of the image. To address this, we introduce a KL-reweighting strategy to control the amount of infor mation in each latent group, and employ a Gaussian output layer to reduce sharpness in the learning objective. To trade off image diversity for fidelity, we additionally introduce a classifier-free guidance strategy for hierarchical VAEs. We demonstrate the effectiveness of these techniques in our experiments. Code is available at https://github.com/tcl9876/visual-vae.

Related papers

A Contrastive Learning Foundation Model Based on Perfectly Aligned Sample Pairs for Remote Sensing Images [18.191222010916405]
We present a novel self-supervised method called PerA, which produces all-purpose Remote Sensing features through semantically Perfectly Aligned sample pairs.<n>Our framework provides high-quality features by ensuring consistency between teacher and student.<n>We collect an unlabeled pre-training dataset, which contains about 5 million RS images.
arXiv Detail & Related papers (2025-05-26T03:12:49Z)
Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations. Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z)
Image Embedding Sampling Method for Diverse Captioning [0.0]
This paper introduces a training-free framework that enhances caption diversity and informativeness by explicitly attending to distinct image regions. We demonstrate that our method allows smaller VLMs to achieve performance comparable to larger models in terms of image-caption alignment, semantic integrity, and diversity.
arXiv Detail & Related papers (2025-02-14T12:33:19Z)
DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling [6.7206291284535125]
We present an effective data augmentation framework leveraging the Large Language Model (LLM) and Diffusion Model (DM) Our approach addresses the issue of increasing the diversity of synthetic images. Our method produces synthetic images with enhanced diversity while maintaining adherence to the target distribution.
arXiv Detail & Related papers (2024-09-25T14:02:43Z)
Transformer-based Clipped Contrastive Quantization Learning for Unsupervised Image Retrieval [15.982022297570108]
Unsupervised image retrieval aims to learn the important visual characteristics without any given level to retrieve the similar images for a given query image. In this paper, we propose a TransClippedCLR model by encoding the global context of an image using Transformer having local context through patch based processing. Results using the proposed clipped contrastive learning are greatly improved on all datasets as compared to same backbone network with vanilla contrastive learning.
arXiv Detail & Related papers (2024-01-27T09:39:11Z)
ARNIQA: Learning Distortion Manifold for Image Quality Assessment [28.773037051085318]
No-Reference Image Quality Assessment (NR-IQA) aims to develop methods to measure image quality in alignment with human perception without the need for a high-quality reference image. We propose a self-supervised approach named ARNIQA for modeling the image distortion manifold to obtain quality representations in an intrinsic manner.
arXiv Detail & Related papers (2023-10-20T17:22:25Z)
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks. We propose a single-stage and standalone method, MOCA, which unifies both desired properties. We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z)
Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data. We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process. In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z)
Masked Images Are Counterfactual Samples for Robust Fine-tuning [77.82348472169335]
Fine-tuning deep learning models can lead to a trade-off between in-distribution (ID) performance and out-of-distribution (OOD) robustness. We propose a novel fine-tuning method, which uses masked images as counterfactual samples that help improve the robustness of the fine-tuning model.
arXiv Detail & Related papers (2023-03-06T11:51:28Z)
Vector Quantized Wasserstein Auto-Encoder [57.29764749855623]
We study learning deep discrete representations from the generative viewpoint. We endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution. We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution.
arXiv Detail & Related papers (2023-02-12T13:51:36Z)
Hierarchical Residual Learning Based Vector Quantized Variational Autoencoder for Image Reconstruction and Generation [19.92324010429006]
We propose a multi-layer variational autoencoder method, we call HR-VQVAE, that learns hierarchical discrete representations of the data. We evaluate our method on the tasks of image reconstruction and generation.
arXiv Detail & Related papers (2022-08-09T06:04:25Z)
Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
DWDN: Deep Wiener Deconvolution Network for Non-Blind Image Deblurring [66.91879314310842]
We propose an explicit deconvolution process in a feature space by integrating a classical Wiener deconvolution framework with learned deep features. A multi-scale cascaded feature refinement module then predicts the deblurred image from the deconvolved deep features. We show that the proposed deep Wiener deconvolution network facilitates deblurred results with visibly fewer artifacts and quantitatively outperforms state-of-the-art non-blind image deblurring methods by a wide margin.
arXiv Detail & Related papers (2021-03-18T00:38:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.