HET: Scaling out Huge Embedding Model Training via Cache-enabled
Distributed Framework
- URL: http://arxiv.org/abs/2112.07221v1
- Date: Tue, 14 Dec 2021 08:18:10 GMT
- Title: HET: Scaling out Huge Embedding Model Training via Cache-enabled
Distributed Framework
- Authors: Xupeng Miao, Hailin Zhang, Yining Shi, Xiaonan Nie, Zhi Yang, Yangyu
Tao, Bin Cui
- Abstract summary: We propose HET, a new system framework that significantly improves the scalability of huge embedding model training.
HET achieves up to 88% embedding communication reductions and up to 20.68x performance speedup over the state-of-the-art baselines.
- Score: 17.114812060566766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embedding models have been an effective learning paradigm for
high-dimensional data. However, one open issue of embedding models is that
their representations (latent factors) often result in large parameter space.
We observe that existing distributed training frameworks face a scalability
issue of embedding models since updating and retrieving the shared embedding
parameters from servers usually dominates the training cycle. In this paper, we
propose HET, a new system framework that significantly improves the scalability
of huge embedding model training. We embrace skewed popularity distributions of
embeddings as a performance opportunity and leverage it to address the
communication bottleneck with an embedding cache. To ensure consistency across
the caches, we incorporate a new consistency model into HET design, which
provides fine-grained consistency guarantees on a per-embedding basis. Compared
to previous work that only allows staleness for read operations, HET also
utilizes staleness for write operations. Evaluations on six representative
tasks show that HET achieves up to 88% embedding communication reductions and
up to 20.68x performance speedup over the state-of-the-art baselines.
Related papers
- Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation [12.653336728447654]
We propose a class-shared memory (CSM) module consisting of a set of learnable memory vectors.
These memory vectors learn elemental object patterns from base classes during training whilst re-encoding query features during both training and inference.
We integrate CSM and UFA into representative FSS works, with experimental results on the widely-used PASCAL-5$i$ and COCO-20$i$ datasets.
arXiv Detail & Related papers (2024-06-01T19:53:25Z) - Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements [20.96380700548786]
Learning compatible representations enables the interchangeable use of semantic features as models are updated over time.
This is particularly relevant in search and retrieval systems where it is crucial to avoid reprocessing of the gallery images with the updated model.
We show that the stationary representations learned by the $d$-Simplex fixed classifier optimally approximate compatibility representation according to the two inequality constraints of its formal definition.
arXiv Detail & Related papers (2024-05-04T06:31:38Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - Federated Topic Model and Model Pruning Based on Variational Autoencoder [14.737942599204064]
Federated topic modeling allows multiple parties to jointly train models while protecting data privacy.
This paper proposes a method to establish a federated topic model while ensuring the privacy of each node, and use neural network model pruning to accelerate the model.
Experimental results show that the federated topic model pruning can greatly accelerate the model training speed while ensuring the model's performance.
arXiv Detail & Related papers (2023-11-01T06:00:14Z) - Enhancing Cross-Category Learning in Recommendation Systems with
Multi-Layer Embedding Training [2.4862527485819186]
Multi-layer embeddings training (MLET) trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension.
MLET consistently produces better models, especially for rare items.
At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average.
arXiv Detail & Related papers (2023-09-27T09:32:10Z) - Joint Modeling of Feature, Correspondence, and a Compressed Memory for
Video Object Segmentation [52.11279360934703]
Current prevailing Video Object (VOS) methods usually perform dense matching between the current and reference frames after extracting features.
We propose a unified VOS framework, coined as JointFormer, for joint modeling of the three elements of feature, correspondence, and a compressed memory.
arXiv Detail & Related papers (2023-08-25T17:30:08Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z) - Adaptive Cross Batch Normalization for Metric Learning [75.91093210956116]
Metric learning is a fundamental problem in computer vision.
We show that it is equally important to ensure that the accumulated embeddings are up to date.
In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration.
arXiv Detail & Related papers (2023-03-30T03:22:52Z) - BagPipe: Accelerating Deep Recommendation Model Training [9.911467752221863]
Bagpipe is a system for training deep recommendation models that uses caching and prefetching to overlap remote embedding accesses with the computation.
We design an Oracle Cacher, a new component that uses a lookahead algorithm to generate optimal cache update decisions.
arXiv Detail & Related papers (2022-02-24T23:54:12Z) - Top-KAST: Top-K Always Sparse Training [50.05611544535801]
We propose Top-KAST, a method that preserves constant sparsity throughout training.
We show that it performs comparably to or better than previous works when training models on the established ImageNet benchmark.
In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling.
arXiv Detail & Related papers (2021-06-07T11:13:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.