Learning Effective and Efficient Embedding via an Adaptively-Masked
Twins-based Layer
- URL: http://arxiv.org/abs/2108.11513v1
- Date: Tue, 24 Aug 2021 11:50:49 GMT
- Title: Learning Effective and Efficient Embedding via an Adaptively-Masked
Twins-based Layer
- Authors: Bencheng Yan, Pengjie Wang, Kai Zhang, Wei Lin, Kuang-Chih Lee, Jian
Xu and Bo Zheng
- Abstract summary: We propose an Adaptively-Masked Twins-based Layer (AMTL) behind the standard embedding layer.
AMTL generates a mask vector to mask the undesired dimensions for each embedding vector.
The mask vector brings flexibility in selecting the dimensions and the proposed layer can be easily added to either untrained or trained DLRMs.
- Score: 15.403616481651383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embedding learning for categorical features is crucial for the deep
learning-based recommendation models (DLRMs). Each feature value is mapped to
an embedding vector via an embedding learning process. Conventional methods
configure a fixed and uniform embedding size to all feature values from the
same feature field. However, such a configuration is not only sub-optimal for
embedding learning but also memory costly. Existing methods that attempt to
resolve these problems, either rule-based or neural architecture search
(NAS)-based, need extensive efforts on the human design or network training.
They are also not flexible in embedding size selection or in warm-start-based
applications. In this paper, we propose a novel and effective embedding size
selection scheme. Specifically, we design an Adaptively-Masked Twins-based
Layer (AMTL) behind the standard embedding layer. AMTL generates a mask vector
to mask the undesired dimensions for each embedding vector. The mask vector
brings flexibility in selecting the dimensions and the proposed layer can be
easily added to either untrained or trained DLRMs. Extensive experimental
evaluations show that the proposed scheme outperforms competitive baselines on
all the benchmark tasks, and is also memory-efficient, saving 60\% memory usage
without compromising any performance metrics.
Related papers
- Enhancing Cross-Category Learning in Recommendation Systems with
Multi-Layer Embedding Training [2.4862527485819186]
Multi-layer embeddings training (MLET) trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension.
MLET consistently produces better models, especially for rare items.
At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average.
arXiv Detail & Related papers (2023-09-27T09:32:10Z) - Towards A Unified View of Sparse Feed-Forward Network in Pretraining
Large Language Model [58.9100867327305]
Large and sparse feed-forward layers (S-FFN) have proven effective in scaling up Transformers model size for textitpretraining large language models.
We analyzed two major design choices of S-FFN: the memory block (a.k.a. expert) size and the memory block selection method.
We found a simpler selection method -- textbftextttAvg-K that selects blocks through their mean aggregated hidden states, achieving lower perplexity in language model pretraining.
arXiv Detail & Related papers (2023-05-23T12:28:37Z) - Learning to Learn Better for Video Object Segmentation [94.5753973590207]
We propose a novel framework that emphasizes Learning to Learn Better (LLB) target features for SVOS.
We design the discriminative label generation module (DLGM) and the adaptive fusion module to address these issues.
Our proposed LLB method achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-12-05T09:10:34Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - Hierarchical Variational Memory for Few-shot Learning Across Domains [120.87679627651153]
We introduce a hierarchical prototype model, where each level of the prototype fetches corresponding information from the hierarchical memory.
The model is endowed with the ability to flexibly rely on features at different semantic levels if the domain shift circumstances so demand.
We conduct thorough ablation studies to demonstrate the effectiveness of each component in our model.
arXiv Detail & Related papers (2021-12-15T15:01:29Z) - Binary Code based Hash Embedding for Web-scale Applications [12.851057275052506]
Deep learning models are widely adopted in web-scale applications such as recommender systems, and online advertising.
In these applications, embedding learning of categorical features is crucial to the success of deep learning models.
We propose a binary code based hash embedding method which allows the size of the embedding table to be reduced in arbitrary scale without compromising too much performance.
arXiv Detail & Related papers (2021-08-24T11:51:15Z) - Semantically Constrained Memory Allocation (SCMA) for Embedding in
Efficient Recommendation Systems [27.419109620575313]
A key challenge for deep learning models is to work with millions of categorical classes or tokens.
We propose a novel formulation of memory shared embedding, where memory is shared in proportion to the overlap in semantic information.
We demonstrate a significant reduction in the memory footprint while maintaining performance.
arXiv Detail & Related papers (2021-02-24T19:55:49Z) - Learning to Embed Categorical Features without Embedding Tables for
Recommendation [22.561967284428707]
We propose an alternative embedding framework, replacing embedding tables by a deep embedding network to compute embeddings on the fly.
The encoding module is deterministic, non-learnable, and free of storage, while the embedding network is updated during the training time to learn embedding generation.
arXiv Detail & Related papers (2020-10-21T06:37:28Z) - Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results.
Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples.
Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z) - Shape Adaptor: A Learnable Resizing Module [59.940372879848624]
We present a novel resizing module for neural networks: shape adaptor, a drop-in enhancement built on top of traditional resizing layers.
Our implementation enables shape adaptors to be trained end-to-end without any additional supervision.
We show the effectiveness of shape adaptors on two other applications: network compression and transfer learning.
arXiv Detail & Related papers (2020-08-03T14:15:52Z) - Training with Multi-Layer Embeddings for Model Reduction [0.9046327456472286]
We introduce a multi-layer embedding training architecture that trains embeddings via a sequence of linear layers.
We show that it allows reducing d by 4-8X, with a corresponding improvement in memory footprint, at given model accuracy.
arXiv Detail & Related papers (2020-06-10T02:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.