Related papers: Embedding Compression with Isotropic Iterative Quantization

Embedding Compression with Isotropic Iterative Quantization

URL: http://arxiv.org/abs/2001.05314v2
Date: Thu, 23 Jan 2020 01:01:56 GMT
Title: Embedding Compression with Isotropic Iterative Quantization
Authors: Siyu Liao, Jie Chen, Yanzhi Wang, Qinru Qiu, Bo Yuan
Abstract summary: Continuous representation of words is a standard component in deep learning-based NLP models. We propose an isotropic iterative quantization (IIQ) approach for compressing embedding vectors into binary ones.
Score: 40.567720430910725
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continuous representation of words is a standard component in deep learning-based NLP models. However, representing a large vocabulary requires significant memory, which can cause problems, particularly on resource-constrained platforms. Therefore, in this paper we propose an isotropic iterative quantization (IIQ) approach for compressing embedding vectors into binary ones, leveraging the iterative quantization technique well established for image retrieval, while satisfying the desired isotropic property of PMI based models. Experiments with pre-trained embeddings (i.e., GloVe and HDC) demonstrate a more than thirty-fold compression ratio with comparable and sometimes even improved performance over the original real-valued embedding vectors.

Related papers

Unified Scaling Laws for Compressed Representations [69.72517034565467]
We investigate whether a unified scaling framework can accurately predict model performance when training occurs over various compressed representations.<n>Our main finding is demonstrating both theoretically and empirically that there exists a simple "capacity" metric.<n>We extend our formulation to directly compare the accuracy potential of different compressed formats, and to derive better algorithms for training over sparse-quantized formats.
arXiv Detail & Related papers (2025-06-02T16:52:51Z)
Learned Image Compression with Dictionary-based Entropy Model [32.966391277632646]
Entropy model plays a key role in learned image compression. Most existing methods employed hyper-prior and auto-regressive architectures to form their entropy models. We propose a novel entropy model named Dictionary-based Cross Attention Entropy model.
arXiv Detail & Related papers (2025-04-01T07:43:10Z)
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem [71.3332971315821]
We present a "line theoremarity" establishing a direct relationship between the layer-wise $ell$ reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels.
arXiv Detail & Related papers (2024-11-26T15:35:44Z)
Learning Optimal Lattice Vector Quantizers for End-to-end Neural Image Compression [16.892815659154053]
Lattice vector quantization (LVQ) presents a compelling alternative, which can exploit inter-feature dependencies more effectively. Traditional LVQ structures are designed/optimized for uniform source distributions. We propose a novel learning method to overcome this weakness by designing the rate-distortion optimal lattice vector quantization codebooks.
arXiv Detail & Related papers (2024-11-25T06:05:08Z)
Convolutional Neural Network Compression Based on Low-Rank Decomposition [3.3295360710329738]
This paper proposes a model compression method that integrates Variational Bayesian Matrix Factorization. VBMF is employed to estimate the rank of the weight tensor at each layer. Experimental results show that for both high and low compression ratios, our compression model exhibits advanced performance.
arXiv Detail & Related papers (2024-08-29T06:40:34Z)
Quantization of Large Language Models with an Overdetermined Basis [73.79368761182998]
We introduce an algorithm for data quantization based on the principles of Kashin representation. Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance.
arXiv Detail & Related papers (2024-04-15T12:38:46Z)
Quantization Aware Factorization for Deep Neural Network Compression [20.04951101799232]
decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. A conventional post-training quantization approach applied to networks with weights yields a drop in accuracy. This motivated us to develop an algorithm that finds decomposed approximation directly with quantized factors.
arXiv Detail & Related papers (2023-08-08T21:38:02Z)
Regularized Vector Quantization for Tokenized Image Synthesis [126.96880843754066]
Quantizing images into discrete representations has been a fundamental problem in unified generative modeling. deterministic quantization suffers from severe codebook collapse and misalignment with inference stage while quantization suffers from low codebook utilization and reconstruction objective. This paper presents a regularized vector quantization framework that allows to mitigate perturbed above issues effectively by applying regularization from two perspectives.
arXiv Detail & Related papers (2023-03-11T15:20:54Z)
Modality-Agnostic Variational Compression of Implicit Neural Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR) Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism. After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z)
Compression of Generative Pre-trained Language Models via Quantization [62.80110048377957]
We find that previous quantization methods fail on generative tasks due to the textithomogeneous word embeddings We propose a token-level contrastive distillation to learn distinguishable word embeddings, and a module-wise dynamic scaling to make quantizers adaptive to different modules.
arXiv Detail & Related papers (2022-03-21T02:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.