Related papers: Learning to Embed Categorical Features without Embedding Tables for Recommendation

Learning to Embed Categorical Features without Embedding Tables for Recommendation

URL: http://arxiv.org/abs/2010.10784v2
Date: Mon, 7 Jun 2021 06:31:19 GMT
Title: Learning to Embed Categorical Features without Embedding Tables for Recommendation
Authors: Wang-Cheng Kang, Derek Zhiyuan Cheng, Tiansheng Yao, Xinyang Yi, Ting Chen, Lichan Hong, Ed H. Chi
Abstract summary: We propose an alternative embedding framework, replacing embedding tables by a deep embedding network to compute embeddings on the fly. The encoding module is deterministic, non-learnable, and free of storage, while the embedding network is updated during the training time to learn embedding generation.
Score: 22.561967284428707
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Embedding learning of categorical features (e.g. user/item IDs) is at the core of various recommendation models including matrix factorization and neural collaborative filtering. The standard approach creates an embedding table where each row represents a dedicated embedding vector for every unique feature value. However, this method fails to efficiently handle high-cardinality features and unseen feature values (e.g. new video ID) that are prevalent in real-world recommendation systems. In this paper, we propose an alternative embedding framework Deep Hash Embedding (DHE), replacing embedding tables by a deep embedding network to compute embeddings on the fly. DHE first encodes the feature value to a unique identifier vector with multiple hashing functions and transformations, and then applies a DNN to convert the identifier vector to an embedding. The encoding module is deterministic, non-learnable, and free of storage, while the embedding network is updated during the training time to learn embedding generation. Empirical results show that DHE achieves comparable AUC against the standard one-hot full embedding, with smaller model sizes. Our work sheds light on the design of DNN-based alternative embedding schemes for categorical features without using embedding table lookup.

Related papers

A Pipeline of Augmentation and Sequence Embedding for Classification of Imbalanced Network Traffic [0.0]
We propose a pipeline to balance the dataset and classify it using a robust and accurate embedding technique. We demonstrate that the proposed augmentation pipeline, combined with FS-Embedding, increases convergence speed and leads to a significant reduction in the number of model parameters.
arXiv Detail & Related papers (2025-02-26T07:55:24Z)
Deep Feature Embedding for Tabular Data [2.1301560294088318]
This paper proposes a novel deep embedding framework with leverages lightweight deep neural networks. For numerical features, a two-step feature expansion and deep transformation technique is used to capture copious semantic information. Experiments are conducted on real-world datasets for performance evaluation.
arXiv Detail & Related papers (2024-08-30T10:05:24Z)
Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later [76.66498833720411]
We introduce a differentiable version of $K$-nearest neighbors (KNN) originally designed to learn a linear projection to capture semantic similarities between instances. Surprisingly, our implementation of NCA using SGD and without dimensionality reduction already achieves decent performance on tabular data. We conclude our paper by analyzing the factors behind these improvements, including loss functions, prediction strategies, and deep architectures.
arXiv Detail & Related papers (2024-07-03T16:38:57Z)
Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing. We present TP-BERTa, a specifically pre-trained LM for tabular data prediction. A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z)
Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z)
Model-based feature selection for neural networks: A mixed-integer programming approach [0.9281671380673306]
We develop a novel input feature selection framework for ReLU-based deep neural networks (DNNs) We focus on finding input features for image classification for clarity of presentation. We show that the proposed input feature selection allows us to drastically reduce the size of the input to $sim$15% while maintaining a good classification accuracy.
arXiv Detail & Related papers (2023-02-20T22:19:50Z)
Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part. We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge. Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z)
Deep ensembles in bioimage segmentation [74.01883650587321]
In this work, we propose an ensemble of convolutional neural networks (CNNs) In ensemble methods, many different models are trained and then used for classification, the ensemble aggregates the outputs of the single classifiers. The proposed ensemble is implemented by combining different backbone networks using the DeepLabV3+ and HarDNet environment.
arXiv Detail & Related papers (2021-12-24T05:54:21Z)
Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer [15.403616481651383]
We propose an Adaptively-Masked Twins-based Layer (AMTL) behind the standard embedding layer. AMTL generates a mask vector to mask the undesired dimensions for each embedding vector. The mask vector brings flexibility in selecting the dimensions and the proposed layer can be easily added to either untrained or trained DLRMs.
arXiv Detail & Related papers (2021-08-24T11:50:49Z)
AutoLL: Automatic Linear Layout of Graphs based on Deep Neural Network [41.60125423028092]
We propose a new one-mode linear layout method referred to as AutoLL. We developed two types of neural network models, AutoLL-D and AutoLL-U, for reordering directed and undirected networks.
arXiv Detail & Related papers (2021-08-05T08:04:15Z)
Rank-R FNN: A Tensor-Based Learning Model for High-Order Data Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters. First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension. We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z)
Sparse-Interest Network for Sequential Recommendation [78.83064567614656]
We propose a novel textbfSparse textbfInterest textbfNEtwork (SINE) for sequential recommendation. Our sparse-interest module can adaptively infer a sparse set of concepts for each user from the large concept pool. SINE can achieve substantial improvement over state-of-the-art methods.
arXiv Detail & Related papers (2021-02-18T11:03:48Z)
Embedded methods for feature selection in neural networks [0.0]
Black box models like neural networks negatively affect the interpretability, generalizability, and the training time of these models. I propose two integrated approaches for feature selection that can be incorporated directly into the parameter learning. I benchmarked both the methods against Permutation Feature Importance (PFI) - a general-purpose feature ranking method and a random baseline.
arXiv Detail & Related papers (2020-10-12T16:33:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.