Network Memory Footprint Compression Through Jointly Learnable Codebooks
and Mappings
- URL: http://arxiv.org/abs/2309.17361v1
- Date: Fri, 29 Sep 2023 16:04:55 GMT
- Title: Network Memory Footprint Compression Through Jointly Learnable Codebooks
and Mappings
- Authors: Edouard Yvinec, Arnaud Dapogny, Kevin Bailly
- Abstract summary: quantization is a favored solution as it maps high precision tensors to a low precision, memory efficient format.
In terms of memory footprint reduction, its most effective variants are based on codebooks.
We propose a joint learning of the codebook and weight mappings that bears similarities with recent gradient-based post-training quantization techniques.
- Score: 23.1120983784623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The massive interest in deep neural networks (DNNs) for both computer vision
and natural language processing has been sparked by the growth in computational
power. However, this led to an increase in the memory footprint, to a point
where it can be challenging to simply load a model on commodity devices such as
mobile phones. To address this limitation, quantization is a favored solution
as it maps high precision tensors to a low precision, memory efficient format.
In terms of memory footprint reduction, its most effective variants are based
on codebooks. These methods, however, suffer from two limitations. First, they
either define a single codebook for each tensor, or use a memory-expensive
mapping to multiple codebooks. Second, gradient descent optimization of the
mapping favors jumps toward extreme values, hence not defining a proximal
search. In this work, we propose to address these two limitations. First, we
initially group similarly distributed neurons and leverage the re-ordered
structure to either apply different scale factors to the different groups, or
map weights that fall in these groups to several codebooks, without any mapping
overhead. Second, stemming from this initialization, we propose a joint
learning of the codebook and weight mappings that bears similarities with
recent gradient-based post-training quantization techniques. Third, drawing
estimation from straight-through estimation techniques, we introduce a novel
gradient update definition to enable a proximal search of the codebooks and
their mappings. The proposed jointly learnable codebooks and mappings (JLCM)
method allows a very efficient approximation of any DNN: as such, a Llama 7B
can be compressed down to 2Go and loaded on 5-year-old smartphones.
Related papers
- LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation [2.0901574458380403]
We propose a new lightweight but efficient model, namely LiteNeXt, for medical image segmentation.
LiteNeXt is trained from scratch with small amount of parameters (0.71M) and Giga Floating Point Operations Per Second (0.42).
arXiv Detail & Related papers (2024-04-04T01:59:19Z) - Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling [15.132926378740882]
We propose a novel codebook transfer framework with part-of-speech, called VQCT, which aims to transfer a well-trained codebook from pretrained language models to VQIM.
Experimental results on four datasets show that our VQCT method achieves superior VQIM performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2024-03-15T07:24:13Z) - Spherical and Hyperbolic Toric Topology-Based Codes On Graph Embedding
for Ising MRF Models: Classical and Quantum Topology Machine Learning [0.11805137592431453]
The paper introduces the application of information geometry to describe the ground states of Ising models.
The approach establishes a connection between machine learning and error-correcting coding.
arXiv Detail & Related papers (2023-07-28T19:38:13Z) - Not All Image Regions Matter: Masked Vector Quantization for
Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.
We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - Improving Dual-Encoder Training through Dynamic Indexes for Negative
Mining [61.09807522366773]
We introduce an algorithm that approximates the softmax with provable bounds and that dynamically maintains the tree.
In our study on datasets with over twenty million targets, our approach cuts error by half in relation to oracle brute-force negative mining.
arXiv Detail & Related papers (2023-03-27T15:18:32Z) - Fast offset corrected in-memory training [0.0]
We propose and describe two new and improved algorithms for in-memory computing.
Chopped-TTv2 (c-TTv2) and Analog Gradient Accumulation with Dynamic reference (AGAD) retain the same runtime complexity but correct for any remaining offsets using choppers.
arXiv Detail & Related papers (2023-03-08T17:07:09Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - Re2G: Retrieve, Rerank, Generate [14.848179433828252]
We propose Re2G, which combines neural initial retrieval and reranking into a BART-based sequence-to-sequence generation.
To train our system end-to-end, we introduce a novel variation of knowledge distillation to train the initial retrieval, reranker, and generation using only ground truth on the target sequence output.
We find incomparable gains in four diverse tasks: zero-shot slot filling, question answering, fact-checking, and dialog, with relative gains of 9% to 34% over the previous state-of-the-art on the KILT leaderboard.
arXiv Detail & Related papers (2022-07-13T15:51:40Z) - Permute, Quantize, and Fine-tune: Efficient Compression of Neural
Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together.
In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function.
We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z) - Sparse Graphical Memory for Robust Planning [93.39298821537197]
We introduce Sparse Graphical Memory (SGM), a new data structure that stores states and feasible transitions in a sparse memory.
SGM aggregates states according to a novel two-way consistency objective, adapting classic state aggregation criteria to goal-conditioned RL.
We show that SGM significantly outperforms current state of the art methods on long horizon, sparse-reward visual navigation tasks.
arXiv Detail & Related papers (2020-03-13T17:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.