Hiformer: Heterogeneous Feature Interactions Learning with Transformers
for Recommender Systems
- URL: http://arxiv.org/abs/2311.05884v1
- Date: Fri, 10 Nov 2023 05:57:57 GMT
- Title: Hiformer: Heterogeneous Feature Interactions Learning with Transformers
for Recommender Systems
- Authors: Huan Gui, Ruoxi Wang, Ke Yin, Long Jin, Maciej Kula, Taibai Xu, Lichan
Hong, Ed H. Chi
- Abstract summary: We propose to leverage a Transformer-based architecture with attention layers to automatically capture feature interactions.
We identify two key challenges for applying the vanilla Transformer architecture to web-scale recommender systems.
- Score: 27.781785405875084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning feature interaction is the critical backbone to building recommender
systems. In web-scale applications, learning feature interaction is extremely
challenging due to the sparse and large input feature space; meanwhile,
manually crafting effective feature interactions is infeasible because of the
exponential solution space. We propose to leverage a Transformer-based
architecture with attention layers to automatically capture feature
interactions. Transformer architectures have witnessed great success in many
domains, such as natural language processing and computer vision. However,
there has not been much adoption of Transformer architecture for feature
interaction modeling in industry. We aim at closing the gap. We identify two
key challenges for applying the vanilla Transformer architecture to web-scale
recommender systems: (1) Transformer architecture fails to capture the
heterogeneous feature interactions in the self-attention layer; (2) The serving
latency of Transformer architecture might be too high to be deployed in
web-scale recommender systems. We first propose a heterogeneous self-attention
layer, which is a simple yet effective modification to the self-attention layer
in Transformer, to take into account the heterogeneity of feature interactions.
We then introduce \textsc{Hiformer} (\textbf{H}eterogeneous
\textbf{I}nteraction Trans\textbf{former}) to further improve the model
expressiveness. With low-rank approximation and model pruning, \hiformer enjoys
fast inference for online deployment. Extensive offline experiment results
corroborates the effectiveness and efficiency of the \textsc{Hiformer} model.
We have successfully deployed the \textsc{Hiformer} model to a real world large
scale App ranking model at Google Play, with significant improvement in key
engagement metrics (up to +2.66\%).
Related papers
- CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism.
We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies.
By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z) - Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers [56.264673865476986]
This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models.
SLA improves the model's ability to capture dependencies between high-level abstract features and low-level details.
Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer.
arXiv Detail & Related papers (2024-06-17T07:24:38Z) - Transformers and Slot Encoding for Sample Efficient Physical World Modelling [1.5498250598583487]
We propose an architecture combining Transformers for world modelling with the slot-attention paradigm, an approach for learning representations of objects appearing in a scene.
We describe the resulting neural architecture and report experimental results showing an improvement over the existing solutions in terms of sample efficiency and a reduction of the variation of the performance over the training examples.
arXiv Detail & Related papers (2024-05-30T15:48:04Z) - AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning [7.886461196772644]
We introduce alternatives to the transformer self-attention mechanism that offer context-independent inference cost.
Compared with a state-of-the-art architecture, GTrXL, inference in our approach is at least 40% cheaper while reducing memory use more than 50%.
arXiv Detail & Related papers (2023-10-24T10:51:50Z) - Transformer variational wave functions for frustrated quantum spin
systems [0.0]
We propose an adaptation of the ViT architecture with complex parameters to define a new class of variational neural-network states.
The success of the ViT wave function relies on mixing both local and global operations.
arXiv Detail & Related papers (2022-11-10T11:56:44Z) - Exploring Structure-aware Transformer over Interaction Proposals for
Human-Object Interaction Detection [119.93025368028083]
We design a novel Transformer-style Human-Object Interaction (HOI) detector, i.e., Structure-aware Transformer over Interaction Proposals (STIP)
STIP decomposes the process of HOI set prediction into two subsequent phases, i.e., an interaction proposal generation is first performed, and then followed by transforming the non-parametric interaction proposals into HOI predictions via a structure-aware Transformer.
The structure-aware Transformer upgrades vanilla Transformer by encoding additionally the holistically semantic structure among interaction proposals as well as the locally spatial structure of human/object within each interaction proposal, so as to strengthen HOI
arXiv Detail & Related papers (2022-06-13T16:21:08Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture.
We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions.
We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z) - THG: Transformer with Hyperbolic Geometry [8.895324519034057]
"X-former" models make changes only around the quadratic time and memory complexity of self-attention.
We propose a novel Transformer with Hyperbolic Geometry (THG) model, which take the advantage of both Euclidean space and Hyperbolic space.
arXiv Detail & Related papers (2021-06-01T14:09:33Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.