Embedding Compression with Isotropic Iterative Quantization
- URL: http://arxiv.org/abs/2001.05314v2
- Date: Thu, 23 Jan 2020 01:01:56 GMT
- Title: Embedding Compression with Isotropic Iterative Quantization
- Authors: Siyu Liao, Jie Chen, Yanzhi Wang, Qinru Qiu, Bo Yuan
- Abstract summary: Continuous representation of words is a standard component in deep learning-based NLP models.
We propose an isotropic iterative quantization (IIQ) approach for compressing embedding vectors into binary ones.
- Score: 40.567720430910725
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continuous representation of words is a standard component in deep
learning-based NLP models. However, representing a large vocabulary requires
significant memory, which can cause problems, particularly on
resource-constrained platforms. Therefore, in this paper we propose an
isotropic iterative quantization (IIQ) approach for compressing embedding
vectors into binary ones, leveraging the iterative quantization technique well
established for image retrieval, while satisfying the desired isotropic
property of PMI based models. Experiments with pre-trained embeddings (i.e.,
GloVe and HDC) demonstrate a more than thirty-fold compression ratio with
comparable and sometimes even improved performance over the original
real-valued embedding vectors.
Related papers
- Pushing the Limits of Large Language Model Quantization via the Linearity Theorem [71.3332971315821]
We present a "line theoremarity" establishing a direct relationship between the layer-wise $ell$ reconstruction error and the model perplexity increase due to quantization.
This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels.
arXiv Detail & Related papers (2024-11-26T15:35:44Z) - Learning Optimal Lattice Vector Quantizers for End-to-end Neural Image Compression [16.892815659154053]
Lattice vector quantization (LVQ) presents a compelling alternative, which can exploit inter-feature dependencies more effectively.
Traditional LVQ structures are designed/optimized for uniform source distributions.
We propose a novel learning method to overcome this weakness by designing the rate-distortion optimal lattice vector quantization codebooks.
arXiv Detail & Related papers (2024-11-25T06:05:08Z) - Convolutional Neural Network Compression Based on Low-Rank Decomposition [3.3295360710329738]
This paper proposes a model compression method that integrates Variational Bayesian Matrix Factorization.
VBMF is employed to estimate the rank of the weight tensor at each layer.
Experimental results show that for both high and low compression ratios, our compression model exhibits advanced performance.
arXiv Detail & Related papers (2024-08-29T06:40:34Z) - Quantization of Large Language Models with an Overdetermined Basis [73.79368761182998]
We introduce an algorithm for data quantization based on the principles of Kashin representation.
Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance.
arXiv Detail & Related papers (2024-04-15T12:38:46Z) - Quantization Aware Factorization for Deep Neural Network Compression [20.04951101799232]
decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks.
A conventional post-training quantization approach applied to networks with weights yields a drop in accuracy.
This motivated us to develop an algorithm that finds decomposed approximation directly with quantized factors.
arXiv Detail & Related papers (2023-08-08T21:38:02Z) - Regularized Vector Quantization for Tokenized Image Synthesis [126.96880843754066]
Quantizing images into discrete representations has been a fundamental problem in unified generative modeling.
deterministic quantization suffers from severe codebook collapse and misalignment with inference stage while quantization suffers from low codebook utilization and reconstruction objective.
This paper presents a regularized vector quantization framework that allows to mitigate perturbed above issues effectively by applying regularization from two perspectives.
arXiv Detail & Related papers (2023-03-11T15:20:54Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - Compression of Generative Pre-trained Language Models via Quantization [62.80110048377957]
We find that previous quantization methods fail on generative tasks due to the textithomogeneous word embeddings
We propose a token-level contrastive distillation to learn distinguishable word embeddings, and a module-wise dynamic scaling to make quantizers adaptive to different modules.
arXiv Detail & Related papers (2022-03-21T02:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.