Related papers: Clustering the Sketch: A Novel Approach to Embedding Table Compression

Clustering the Sketch: A Novel Approach to Embedding Table Compression

URL: http://arxiv.org/abs/2210.05974v3
Date: Sun, 22 Oct 2023 02:42:20 GMT
Title: Clustering the Sketch: A Novel Approach to Embedding Table Compression
Authors: Henry Ling-Hei Tsang, Thomas Dybdahl Ahle
Abstract summary: Clustered Compositional Embeddings (CCE) combines clustering-based compression like quantization to codebooks with dynamic methods like The Hashing Trick. CCE achieves the best of both worlds: The high compression rate of codebook-based quantization, but *dynamically* like hashing-based methods, so it can be used during training.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Embedding tables are used by machine learning systems to work with categorical features. In modern Recommendation Systems, these tables can be very large, necessitating the development of new methods for fitting them in memory, even during training. We suggest Clustered Compositional Embeddings (CCE) which combines clustering-based compression like quantization to codebooks with dynamic methods like The Hashing Trick and Compositional Embeddings (Shi et al., 2020). Experimentally CCE achieves the best of both worlds: The high compression rate of codebook-based quantization, but *dynamically* like hashing-based methods, so it can be used during training. Theoretically, we prove that CCE is guaranteed to converge to the optimal codebook and give a tight bound for the number of iterations required.

Related papers

A Universal Framework for Compressing Embeddings in CTR Prediction [68.27582084015044]
We introduce a Model-agnostic Embedding Compression (MEC) framework that compresses embedding tables by quantizing pre-trained embeddings. Our approach consists of two stages: first, we apply popularity-weighted regularization to balance code distribution between high- and low-frequency features. Experiments on three datasets reveal that our method reduces memory usage by over 50x while maintaining or improving recommendation performance.
arXiv Detail & Related papers (2025-02-21T10:12:34Z)
Learned Data Compression: Challenges and Opportunities for the Future [34.95766887424342]
Recent advances in emphlearned have inspired the development of emphlearned compressors These compressors leverage simple yet compact machine learning (ML) models to compress large-scale sorted keys. This vision paper explores the potential of learned data compression to enhance critical areas in indexes and related domains.
arXiv Detail & Related papers (2024-12-14T09:47:21Z)
Scalable Image Tokenization with Index Backpropagation Quantization [74.15447383432262]
Index Backpropagation Quantization (IBQ) is a new VQ method for the joint optimization of all codebook embeddings and the visual encoder. IBQ enables scalable training of visual tokenizers and, for the first time, achieves a large-scale codebook with high dimension ($256$) and high utilization.
arXiv Detail & Related papers (2024-12-03T18:59:10Z)
End-to-end Learnable Clustering for Intent Learning in Recommendation [54.157784572994316]
We propose a novel intent learning method termed underlineELCRec. It unifies behavior representation learning into an underlineEnd-to-end underlineLearnable underlineClustering framework. We deploy this method on the industrial recommendation system with 130 million page views and achieve promising results.
arXiv Detail & Related papers (2024-01-11T15:22:55Z)
Network Memory Footprint Compression Through Jointly Learnable Codebooks and Mappings [23.1120983784623]
quantization is a favored solution as it maps high precision tensors to a low precision, memory efficient format. In terms of memory footprint reduction, its most effective variants are based on codebooks. We propose a joint learning of the codebook and weight mappings that bears similarities with recent gradient-based post-training quantization techniques.
arXiv Detail & Related papers (2023-09-29T16:04:55Z)
Online Clustered Codebook [100.1650001618827]
We present a simple alternative method for online codebook learning, Clustering VQ-VAE (CVQ-VAE) Our approach selects encoded features as anchors to update the dead'' codevectors, while optimising the codebooks which are alive via the original loss. Our CVQ-VAE can be easily integrated into the existing models with just a few lines of code.
arXiv Detail & Related papers (2023-07-27T18:31:04Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
HDCC: A Hyperdimensional Computing compiler for classification on embedded systems and high-performance computing [58.720142291102135]
This work introduces the name compiler, the first open-source compiler that translates high-level descriptions of HDC classification methods into optimized C code. name is designed like a modern compiler, featuring an intuitive and descriptive input language, an intermediate representation (IR), and a retargetable backend. To substantiate these claims, we conducted experiments with HDCC on several of the most popular datasets in the HDC literature.
arXiv Detail & Related papers (2023-04-24T19:16:03Z)
Does compressing activations help model parallel training? [64.59298055364336]
We present the first empirical study on the effectiveness of compression methods for model parallelism. We implement and evaluate three common classes of compression algorithms. We evaluate these methods across more than 160 settings and 8 popular datasets.
arXiv Detail & Related papers (2023-01-06T18:58:09Z)
Class-Incremental Learning with Strong Pre-trained Models [97.84755144148535]
Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes) We explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large number of base classes. Our proposed method is robust and generalizes to all analyzed CIL settings.
arXiv Detail & Related papers (2022-04-07T17:58:07Z)
Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective [11.01582936909208]
Asymmetric Numeral Systems (ANS) provides very close to optimal compressions and simplifies advanced compression techniques such as bits-back coding. This paper is meant as an educational resource to make ANS more approachable by presenting it from a new perspective of latent variable models. We guide the reader step by step to a complete implementation of ANS in the Python programming language.
arXiv Detail & Related papers (2022-01-05T18:04:42Z)
Learning on a Grassmann Manifold: CSI Quantization for Massive MIMO Systems [37.499485219254545]
This paper focuses on the design of beamforming codebooks that maximize the average normalized beamforming gain for any underlying channel distribution. We utilize a model-free data-driven approach with foundations in machine learning to generate beamforming codebooks that adapt to the surrounding propagation conditions.
arXiv Detail & Related papers (2020-05-18T01:01:36Z)
A flexible, extensible software framework for model compression based on the LC algorithm [10.787390511207683]
We propose a software framework that allows a user to compress a neural network or other machine learning model with minimal effort. The library is written in Python and PyTorch and available in Github.
arXiv Detail & Related papers (2020-05-15T21:14:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.