HET: Scaling out Huge Embedding Model Training via Cache-enabled
Distributed Framework
- URL: http://arxiv.org/abs/2112.07221v1
- Date: Tue, 14 Dec 2021 08:18:10 GMT
- Title: HET: Scaling out Huge Embedding Model Training via Cache-enabled
Distributed Framework
- Authors: Xupeng Miao, Hailin Zhang, Yining Shi, Xiaonan Nie, Zhi Yang, Yangyu
Tao, Bin Cui
- Abstract summary: We propose HET, a new system framework that significantly improves the scalability of huge embedding model training.
HET achieves up to 88% embedding communication reductions and up to 20.68x performance speedup over the state-of-the-art baselines.
- Score: 17.114812060566766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embedding models have been an effective learning paradigm for
high-dimensional data. However, one open issue of embedding models is that
their representations (latent factors) often result in large parameter space.
We observe that existing distributed training frameworks face a scalability
issue of embedding models since updating and retrieving the shared embedding
parameters from servers usually dominates the training cycle. In this paper, we
propose HET, a new system framework that significantly improves the scalability
of huge embedding model training. We embrace skewed popularity distributions of
embeddings as a performance opportunity and leverage it to address the
communication bottleneck with an embedding cache. To ensure consistency across
the caches, we incorporate a new consistency model into HET design, which
provides fine-grained consistency guarantees on a per-embedding basis. Compared
to previous work that only allows staleness for read operations, HET also
utilizes staleness for write operations. Evaluations on six representative
tasks show that HET achieves up to 88% embedding communication reductions and
up to 20.68x performance speedup over the state-of-the-art baselines.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation [12.653336728447654]
We propose a class-shared memory (CSM) module consisting of a set of learnable memory vectors.
These memory vectors learn elemental object patterns from base classes during training whilst re-encoding query features during both training and inference.
We integrate CSM and UFA into representative FSS works, with experimental results on the widely-used PASCAL-5$i$ and COCO-20$i$ datasets.
arXiv Detail & Related papers (2024-06-01T19:53:25Z) - Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements [20.96380700548786]
Learning compatible representations enables the interchangeable use of semantic features as models are updated over time.
This is particularly relevant in search and retrieval systems where it is crucial to avoid reprocessing of the gallery images with the updated model.
We show that the stationary representations learned by the $d$-Simplex fixed classifier optimally approximate compatibility representation according to the two inequality constraints of its formal definition.
arXiv Detail & Related papers (2024-05-04T06:31:38Z) - Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems [17.602059421895856]
FIITED is a system to automatically reduce the memory footprint via FIne-grained In-Training Embedding Dimension pruning.
We show that FIITED can reduce DLRM embedding size by more than 65% while preserving model quality.
On public datasets, FIITED can reduce the size of embedding tables by 2.1x to 800x with negligible accuracy drop.
arXiv Detail & Related papers (2024-01-09T08:04:11Z) - Enhancing Cross-Category Learning in Recommendation Systems with
Multi-Layer Embedding Training [2.4862527485819186]
Multi-layer embeddings training (MLET) trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension.
MLET consistently produces better models, especially for rare items.
At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average.
arXiv Detail & Related papers (2023-09-27T09:32:10Z) - Joint Modeling of Feature, Correspondence, and a Compressed Memory for
Video Object Segmentation [52.11279360934703]
Current prevailing Video Object (VOS) methods usually perform dense matching between the current and reference frames after extracting features.
We propose a unified VOS framework, coined as JointFormer, for joint modeling of the three elements of feature, correspondence, and a compressed memory.
arXiv Detail & Related papers (2023-08-25T17:30:08Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z) - BagPipe: Accelerating Deep Recommendation Model Training [9.911467752221863]
Bagpipe is a system for training deep recommendation models that uses caching and prefetching to overlap remote embedding accesses with the computation.
We design an Oracle Cacher, a new component that uses a lookahead algorithm to generate optimal cache update decisions.
arXiv Detail & Related papers (2022-02-24T23:54:12Z) - Top-KAST: Top-K Always Sparse Training [50.05611544535801]
We propose Top-KAST, a method that preserves constant sparsity throughout training.
We show that it performs comparably to or better than previous works when training models on the established ImageNet benchmark.
In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling.
arXiv Detail & Related papers (2021-06-07T11:13:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.