Related papers: Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer

Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer

URL: http://arxiv.org/abs/2108.11513v1
Date: Tue, 24 Aug 2021 11:50:49 GMT
Title: Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer
Authors: Bencheng Yan, Pengjie Wang, Kai Zhang, Wei Lin, Kuang-Chih Lee, Jian Xu and Bo Zheng
Abstract summary: We propose an Adaptively-Masked Twins-based Layer (AMTL) behind the standard embedding layer. AMTL generates a mask vector to mask the undesired dimensions for each embedding vector. The mask vector brings flexibility in selecting the dimensions and the proposed layer can be easily added to either untrained or trained DLRMs.
Score: 15.403616481651383
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Embedding learning for categorical features is crucial for the deep learning-based recommendation models (DLRMs). Each feature value is mapped to an embedding vector via an embedding learning process. Conventional methods configure a fixed and uniform embedding size to all feature values from the same feature field. However, such a configuration is not only sub-optimal for embedding learning but also memory costly. Existing methods that attempt to resolve these problems, either rule-based or neural architecture search (NAS)-based, need extensive efforts on the human design or network training. They are also not flexible in embedding size selection or in warm-start-based applications. In this paper, we propose a novel and effective embedding size selection scheme. Specifically, we design an Adaptively-Masked Twins-based Layer (AMTL) behind the standard embedding layer. AMTL generates a mask vector to mask the undesired dimensions for each embedding vector. The mask vector brings flexibility in selecting the dimensions and the proposed layer can be easily added to either untrained or trained DLRMs. Extensive experimental evaluations show that the proposed scheme outperforms competitive baselines on all the benchmark tasks, and is also memory-efficient, saving 60\% memory usage without compromising any performance metrics.

Related papers

2D Matryoshka Sentence Embeddings [11.682642816354418]
We introduce a novel sentence embedding model called textitTwo-dimensional Matryoshka Sentence Embedding (2DMSE)footnote. It supports elastic settings for both embedding sizes and Transformer layers, offering greater flexibility and efficiency than MRL. The experimental results demonstrate the effectiveness of our proposed model in dynamically supporting different embedding sizes and Transformer layers.
arXiv Detail & Related papers (2024-02-22T18:35:05Z)
Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training [2.4862527485819186]
Multi-layer embeddings training (MLET) trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension. MLET consistently produces better models, especially for rare items. At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average.
arXiv Detail & Related papers (2023-09-27T09:32:10Z)
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model [58.9100867327305]
Large and sparse feed-forward layers (S-FFN) have proven effective in scaling up Transformers model size for textitpretraining large language models. We analyzed two major design choices of S-FFN: the memory block (a.k.a. expert) size and the memory block selection method. We found a simpler selection method -- textbftextttAvg-K that selects blocks through their mean aggregated hidden states, achieving lower perplexity in language model pretraining.
arXiv Detail & Related papers (2023-05-23T12:28:37Z)
Learning to Learn Better for Video Object Segmentation [94.5753973590207]
We propose a novel framework that emphasizes Learning to Learn Better (LLB) target features for SVOS. We design the discriminative label generation module (DLGM) and the adaptive fusion module to address these issues. Our proposed LLB method achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-12-05T09:10:34Z)
A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement. We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work. We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z)
Hierarchical Variational Memory for Few-shot Learning Across Domains [120.87679627651153]
We introduce a hierarchical prototype model, where each level of the prototype fetches corresponding information from the hierarchical memory. The model is endowed with the ability to flexibly rely on features at different semantic levels if the domain shift circumstances so demand. We conduct thorough ablation studies to demonstrate the effectiveness of each component in our model.
arXiv Detail & Related papers (2021-12-15T15:01:29Z)
Binary Code based Hash Embedding for Web-scale Applications [12.851057275052506]
Deep learning models are widely adopted in web-scale applications such as recommender systems, and online advertising. In these applications, embedding learning of categorical features is crucial to the success of deep learning models. We propose a binary code based hash embedding method which allows the size of the embedding table to be reduced in arbitrary scale without compromising too much performance.
arXiv Detail & Related papers (2021-08-24T11:51:15Z)
Semantically Constrained Memory Allocation (SCMA) for Embedding in Efficient Recommendation Systems [27.419109620575313]
A key challenge for deep learning models is to work with millions of categorical classes or tokens. We propose a novel formulation of memory shared embedding, where memory is shared in proportion to the overlap in semantic information. We demonstrate a significant reduction in the memory footprint while maintaining performance.
arXiv Detail & Related papers (2021-02-24T19:55:49Z)
Learning to Embed Categorical Features without Embedding Tables for Recommendation [22.561967284428707]
We propose an alternative embedding framework, replacing embedding tables by a deep embedding network to compute embeddings on the fly. The encoding module is deterministic, non-learnable, and free of storage, while the embedding network is updated during the training time to learn embedding generation.
arXiv Detail & Related papers (2020-10-21T06:37:28Z)
Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results. Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples. Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z)
Shape Adaptor: A Learnable Resizing Module [59.940372879848624]
We present a novel resizing module for neural networks: shape adaptor, a drop-in enhancement built on top of traditional resizing layers. Our implementation enables shape adaptors to be trained end-to-end without any additional supervision. We show the effectiveness of shape adaptors on two other applications: network compression and transfer learning.
arXiv Detail & Related papers (2020-08-03T14:15:52Z)
Training with Multi-Layer Embeddings for Model Reduction [0.9046327456472286]
We introduce a multi-layer embedding training architecture that trains embeddings via a sequence of linear layers. We show that it allows reducing d by 4-8X, with a corresponding improvement in memory footprint, at given model accuracy.
arXiv Detail & Related papers (2020-06-10T02:47:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.