Embedding Compression with Isotropic Iterative Quantization
- URL: http://arxiv.org/abs/2001.05314v2
- Date: Thu, 23 Jan 2020 01:01:56 GMT
- Title: Embedding Compression with Isotropic Iterative Quantization
- Authors: Siyu Liao, Jie Chen, Yanzhi Wang, Qinru Qiu, Bo Yuan
- Abstract summary: Continuous representation of words is a standard component in deep learning-based NLP models.
We propose an isotropic iterative quantization (IIQ) approach for compressing embedding vectors into binary ones.
- Score: 40.567720430910725
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continuous representation of words is a standard component in deep
learning-based NLP models. However, representing a large vocabulary requires
significant memory, which can cause problems, particularly on
resource-constrained platforms. Therefore, in this paper we propose an
isotropic iterative quantization (IIQ) approach for compressing embedding
vectors into binary ones, leveraging the iterative quantization technique well
established for image retrieval, while satisfying the desired isotropic
property of PMI based models. Experiments with pre-trained embeddings (i.e.,
GloVe and HDC) demonstrate a more than thirty-fold compression ratio with
comparable and sometimes even improved performance over the original
real-valued embedding vectors.
Related papers
- Quantization of Large Language Models with an Overdetermined Basis [73.79368761182998]
We introduce an algorithm for data quantization based on the principles of Kashin representation.
Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance.
arXiv Detail & Related papers (2024-04-15T12:38:46Z) - Quantization Aware Factorization for Deep Neural Network Compression [20.04951101799232]
decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks.
A conventional post-training quantization approach applied to networks with weights yields a drop in accuracy.
This motivated us to develop an algorithm that finds decomposed approximation directly with quantized factors.
arXiv Detail & Related papers (2023-08-08T21:38:02Z) - Low-Rank Prune-And-Factorize for Language Model Compression [18.088550230146247]
Matrix factorization fails to retain satisfactory performance under moderate to high compression rate.
We propose two techniques: sparsity-aware SVD and mixed-rank fine-tuning.
arXiv Detail & Related papers (2023-06-25T07:38:43Z) - Regularized Vector Quantization for Tokenized Image Synthesis [126.96880843754066]
Quantizing images into discrete representations has been a fundamental problem in unified generative modeling.
deterministic quantization suffers from severe codebook collapse and misalignment with inference stage while quantization suffers from low codebook utilization and reconstruction objective.
This paper presents a regularized vector quantization framework that allows to mitigate perturbed above issues effectively by applying regularization from two perspectives.
arXiv Detail & Related papers (2023-03-11T15:20:54Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - Quantized Sparse Weight Decomposition for Neural Network Compression [12.24566619983231]
We show that this approach can be seen as a unification of weight SVD, vector quantization, and sparse PCA.
Our method is applicable to both moderate compression regimes, unlike vector quantization, and extreme compression regimes.
arXiv Detail & Related papers (2022-07-22T12:40:03Z) - Unified Multivariate Gaussian Mixture for Efficient Neural Image
Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression.
We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective.
Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z) - Compression of Generative Pre-trained Language Models via Quantization [62.80110048377957]
We find that previous quantization methods fail on generative tasks due to the textithomogeneous word embeddings
We propose a token-level contrastive distillation to learn distinguishable word embeddings, and a module-wise dynamic scaling to make quantizers adaptive to different modules.
arXiv Detail & Related papers (2022-03-21T02:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.