Related papers: Restructuring Vector Quantization with the Rotation Trick

Restructuring Vector Quantization with the Rotation Trick

URL: http://arxiv.org/abs/2410.06424v1
Date: Tue, 8 Oct 2024 23:39:34 GMT
Title: Restructuring Vector Quantization with the Rotation Trick
Authors: Christopher Fifty, Ronald G. Junkins, Dennis Duan, Aniketh Iger, Jerry W. Liu, Ehsan Amid, Sebastian Thrun, Christopher Ré,
Abstract summary: Vector Quantized Variational AutoEncoders (VQ-VAEs) are designed to compress a continuous input to a discrete latent space and reconstruct it with minimal distortion. As vector quantization is non-differentiable, the gradient to the encoder flows around the vector quantization layer rather than through it in a straight-through approximation. We propose a way to propagate gradients through the vector quantization layer of VQ-VAEs.
Score: 36.03697966463205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vector Quantized Variational AutoEncoders (VQ-VAEs) are designed to compress a continuous input to a discrete latent space and reconstruct it with minimal distortion. They operate by maintaining a set of vectors -- often referred to as the codebook -- and quantizing each encoder output to the nearest vector in the codebook. However, as vector quantization is non-differentiable, the gradient to the encoder flows around the vector quantization layer rather than through it in a straight-through approximation. This approximation may be undesirable as all information from the vector quantization operation is lost. In this work, we propose a way to propagate gradients through the vector quantization layer of VQ-VAEs. We smoothly transform each encoder output into its corresponding codebook vector via a rotation and rescaling linear transformation that is treated as a constant during backpropagation. As a result, the relative magnitude and angle between encoder output and codebook vector becomes encoded into the gradient as it propagates through the vector quantization layer and back to the encoder. Across 11 different VQ-VAE training paradigms, we find this restructuring improves reconstruction metrics, codebook utilization, and quantization error. Our code is available at https://github.com/cfifty/rotation_trick.

Related papers

Graph Transformers Dream of Electric Flow [72.06286909236827]
We show that the linear Transformer, when applied to graph data, can implement algorithms that solve canonical problems. We present explicit weight configurations for implementing each such graph algorithm, and we bound the errors of the constructed Transformers by the errors of the underlying algorithms.
arXiv Detail & Related papers (2024-10-22T05:11:45Z)
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers [7.369445527610879]
Diffusion Transformers Models (DiTs) have transitioned the network architecture from traditional UNets to transformers, demonstrating exceptional capabilities in image generation. Vector quantization (VQ) can decompose model weight into a codebook and assignments, allowing extreme weight quantization and significantly reducing memory usage. We propose VQ4DiT, a fast post-training vector quantization method for DiTs. Experiments show that VQ4DiT establishes a new state-of-the-art in model size and performance trade-offs, quantizing weights to 2-bit precision while retaining acceptable image generation quality.
arXiv Detail & Related papers (2024-08-30T09:15:54Z)
Segmentation-guided Layer-wise Image Vectorization with Gradient Fills [6.037332707968933]
We propose a segmentation-guided vectorization framework to convert images into concise vector graphics with gradient fills. With the guidance of an embedded gradient-aware segmentation, our approach progressively appends gradient-filled B'ezier paths to the output.
arXiv Detail & Related papers (2024-08-28T12:08:25Z)
Vector Quantization for Deep-Learning-Based CSI Feedback in Massive MIMO Systems [7.934232975873179]
This paper presents a finite-rate deep-learning (DL)-based channel state information (CSI) feedback method for massive multiple-input multiple-output (MIMO) systems. The presented method provides a finite-bit representation of the latent vector based on a vector-quantized variational autoencoder (VQ-VAE) framework.
arXiv Detail & Related papers (2024-03-12T06:28:41Z)
Approaching Rate-Distortion Limits in Neural Compression with Lattice Transform Coding [33.377272636443344]
neural compression design involves transforming the source to a latent vector, which is then rounded to integers and entropy coded. We show that it is highly sub-optimal on i.i.d. sequences, and in fact always recovers scalar quantization of the original source sequence. By employing lattice quantization instead of scalar quantization in the latent space, we demonstrate that Lattice Transform Coding (LTC) is able to recover optimal vector quantization at various dimensions.
arXiv Detail & Related papers (2024-03-12T05:09:25Z)
Soft Convex Quantization: Revisiting Vector Quantization with Convex Optimization [40.1651740183975]
We propose Soft Convex Quantization (SCQ) as a direct substitute for Vector Quantization (VQ) SCQ works like a differentiable convex optimization (DCO) layer. We demonstrate its efficacy on the CIFAR-10, GTSRB and LSUN datasets.
arXiv Detail & Related papers (2023-10-04T17:45:14Z)
Online Clustered Codebook [100.1650001618827]
We present a simple alternative method for online codebook learning, Clustering VQ-VAE (CVQ-VAE) Our approach selects encoded features as anchors to update the dead'' codevectors, while optimising the codebooks which are alive via the original loss. Our CVQ-VAE can be easily integrated into the existing models with just a few lines of code.
arXiv Detail & Related papers (2023-07-27T18:31:04Z)
Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm. We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z)
Robust Differentiable SVD [117.35644933471401]
Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. Instability arises in the presence of eigenvalues that are close to each other. We show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying on an iterative process.
arXiv Detail & Related papers (2021-04-08T15:04:15Z)
Learned transform compression with optimized entropy encoding [72.20409648915398]
We consider the problem of learned transform compression where we learn both, the transform and the probability distribution over the discrete codes. We employ a soft relaxation of the quantization operation to allow for back-propagation of gradients and employ vector (rather than scalar) quantization of the latent codes.
arXiv Detail & Related papers (2021-04-07T17:58:01Z)
Deriving Differential Target Propagation from Iterating Approximate Inverses [91.3755431537592]
We show that a particular form of target propagation, relying on learned inverses of each layer, which is differential, gives rise to an update rule which corresponds to an approximate Gauss-Newton gradient-based optimization. We consider several iterative calculations based on local auto-encoders at each layer in order to achieve more precise inversions for more accurate target propagation.
arXiv Detail & Related papers (2020-07-29T22:34:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.