SC-VAE: Sparse Coding-based Variational Autoencoder with Learned ISTA
- URL: http://arxiv.org/abs/2303.16666v2
- Date: Wed, 10 Jan 2024 05:29:57 GMT
- Title: SC-VAE: Sparse Coding-based Variational Autoencoder with Learned ISTA
- Authors: Pan Xiao, Peijie Qiu, Sungmin Ha, Abdalla Bani, Shuang Zhou,
Aristeidis Sotiras
- Abstract summary: We introduce a new VAE variant, termed sparse coding-based VAE with learned ISTA (SC-VAE), which integrates sparse coding within variational autoencoder framework.
Experiments on two image datasets demonstrate that our model achieves improved image reconstruction results compared to state-of-the-art methods.
- Score: 0.6770292596301478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning rich data representations from unlabeled data is a key challenge
towards applying deep learning algorithms in downstream tasks. Several variants
of variational autoencoders (VAEs) have been proposed to learn compact data
representations by encoding high-dimensional data in a lower dimensional space.
Two main classes of VAEs methods may be distinguished depending on the
characteristics of the meta-priors that are enforced in the representation
learning step. The first class of methods derives a continuous encoding by
assuming a static prior distribution in the latent space. The second class of
methods learns instead a discrete latent representation using vector
quantization (VQ) along with a codebook. However, both classes of methods
suffer from certain challenges, which may lead to suboptimal image
reconstruction results. The first class suffers from posterior collapse,
whereas the second class suffers from codebook collapse. To address these
challenges, we introduce a new VAE variant, termed sparse coding-based VAE with
learned ISTA (SC-VAE), which integrates sparse coding within variational
autoencoder framework. The proposed method learns sparse data representations
that consist of a linear combination of a small number of predetermined
orthogonal atoms. The sparse coding problem is solved using a learnable version
of the iterative shrinkage thresholding algorithm (ISTA). Experiments on two
image datasets demonstrate that our model achieves improved image
reconstruction results compared to state-of-the-art methods. Moreover, we
demonstrate that the use of learned sparse code vectors allows us to perform
downstream tasks like image generation and unsupervised image segmentation
through clustering image patches.
Related papers
- Addressing Representation Collapse in Vector Quantized Models with One Linear Layer [10.532262196027752]
Vector Quantization (VQ) is a widely used method for converting continuous representations into discrete codes.
VQ models are often hindered by the problem of representation collapse in the latent space.
We propose textbfSimVQ, a novel method which re parameterizes the code vectors through a linear transformation layer based on a learnable latent basis.
arXiv Detail & Related papers (2024-11-04T12:40:18Z) - SGC-VQGAN: Towards Complex Scene Representation via Semantic Guided Clustering Codebook [9.993066868670283]
We introduce SGC-VQGAN through Semantic Online Clustering method to enhance token semantics through Consistent Semantic Learning.
Our approach constructs a temporospatially consistent semantic codebook, addressing issues of codebook collapse and imbalanced token semantics.
arXiv Detail & Related papers (2024-09-09T23:12:43Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Not All Image Regions Matter: Masked Vector Quantization for
Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.
We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - Closed-Loop Transcription via Convolutional Sparse Coding [29.75613581643052]
Autoencoders often use generic deep networks as the encoder or decoder, which are difficult to interpret.
In this work, we make the explicit assumption that the image distribution is generated from a multistage convolution sparse coding (CSC)
Our method enjoys several side benefits, including more structured and interpretable representations, more stable convergence, and scalability to large datasets.
arXiv Detail & Related papers (2023-02-18T14:40:07Z) - Vector Quantized Wasserstein Auto-Encoder [57.29764749855623]
We study learning deep discrete representations from the generative viewpoint.
We endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution.
We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution.
arXiv Detail & Related papers (2023-02-12T13:51:36Z) - Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal.
In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective.
Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z) - Deterministic Decoding for Discrete Data in Variational Autoencoders [5.254093731341154]
We study a VAE model with a deterministic decoder (DD-VAE) for sequential data that selects the highest-scoring tokens instead of sampling.
We demonstrate the performance of DD-VAE on multiple datasets, including molecular generation and optimization problems.
arXiv Detail & Related papers (2020-03-04T16:36:52Z) - Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph.
It is updated by decoding in the context of an auto-encoder.
Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.