ConvFormer: Revisiting Transformer for Sequential User Modeling
- URL: http://arxiv.org/abs/2308.02925v1
- Date: Sat, 5 Aug 2023 17:33:17 GMT
- Title: ConvFormer: Revisiting Transformer for Sequential User Modeling
- Authors: Hao Wang, Jianxun Lian, Mingqi Wu, Haoxuan Li, Jiajun Fan, Wanyue Xu,
Chaozhuo Li, Xing Xie
- Abstract summary: We re-examine Transformer-like architectures aiming to advance state-of-the-art performance.
We identify three essential criteria for devising efficient sequential user models.
We introduce ConvFormer, a simple but powerful modification to the Transformer architecture that meets these criteria.
- Score: 23.650635274170828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential user modeling, a critical task in personalized recommender
systems, focuses on predicting the next item a user would prefer, requiring a
deep understanding of user behavior sequences. Despite the remarkable success
of Transformer-based models across various domains, their full potential in
comprehending user behavior remains untapped. In this paper, we re-examine
Transformer-like architectures aiming to advance state-of-the-art performance.
We start by revisiting the core building blocks of Transformer-based methods,
analyzing the effectiveness of the item-to-item mechanism within the context of
sequential user modeling. After conducting a thorough experimental analysis, we
identify three essential criteria for devising efficient sequential user
models, which we hope will serve as practical guidelines to inspire and shape
future designs. Following this, we introduce ConvFormer, a simple but powerful
modification to the Transformer architecture that meets these criteria,
yielding state-of-the-art results. Additionally, we present an acceleration
technique to minimize the complexity associated with processing extremely long
sequences. Experiments on four public datasets showcase ConvFormer's
superiority and confirm the validity of our proposed criteria.
Related papers
- Practical token pruning for foundation models in few-shot conversational virtual assistant systems [6.986560111427867]
We pretrain a transformer-based sentence embedding model with a contrastive learning objective and leverage the embedding of the model as features when training intent classification models.
Our approach achieves the state-of-the-art results for few-shot scenarios and performs better than other commercial solutions on popular intent classification benchmarks.
arXiv Detail & Related papers (2024-08-21T17:42:17Z) - Large Sequence Models for Sequential Decision-Making: A Survey [33.35835438923926]
The Transformer has attracted increasing interest in the RL communities, spawning numerous approaches with notable effectiveness and generalizability.
This paper puts forth various potential avenues for future research intending to improve the effectiveness of large sequence models for sequential decision-making.
arXiv Detail & Related papers (2023-06-24T12:06:26Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - Federated Privacy-preserving Collaborative Filtering for On-Device Next
App Prediction [52.16923290335873]
We propose a novel SeqMF model to solve the problem of predicting the next app launch during mobile device usage.
We modify the structure of the classical matrix factorization model and update the training procedure to sequential learning.
One more ingredient of the proposed approach is a new privacy mechanism that guarantees the protection of the sent data from the users to the remote server.
arXiv Detail & Related papers (2023-02-05T10:29:57Z) - Exploring and Evaluating Personalized Models for Code Generation [9.25440316608194]
We evaluate transformer model fine-tuning for personalization.
We consider three key approaches: (i) custom fine-tuning, which allows all the model parameters to be tuned.
We compare these fine-tuning strategies for code generation and discuss the potential generalization and cost benefits of each in various deployment scenarios.
arXiv Detail & Related papers (2022-08-29T23:28:46Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Topic-Controllable Summarization: Topic-Aware Evaluation and Transformer Methods [4.211128681972148]
Topic-controllable summarization is an emerging research area with a wide range of potential applications.
This work proposes a new topic-oriented evaluation measure to automatically evaluate the generated summaries.
In addition, we adapt topic embeddings to work with powerful Transformer architectures and propose a novel and efficient approach for guiding the summary generation through control tokens.
arXiv Detail & Related papers (2022-06-09T07:28:16Z) - On the Limits of Evaluating Embodied Agent Model Generalization Using
Validation Sets [101.28658250723804]
This paper experiments with augmenting a transformer model with modules that effectively utilize a wider field of view and learn to choose whether the next step requires a navigation or manipulation action.
We observe that the proposed modules resulted in improved, and in fact state-of-the-art performance on an unseen validation set of a popular benchmark dataset, ALFRED.
We highlight this result as we believe it may be a wider phenomenon in machine learning tasks but primarily noticeable only in benchmarks that limit evaluations on test splits.
arXiv Detail & Related papers (2022-05-18T23:52:21Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - Improving Transformer-Kernel Ranking Model Using Conformer and Query
Term Independence [29.442579683405913]
The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark.
A variant of the TK model -- called TKL -- has been developed that incorporates local self-attention to efficiently process longer input sequences.
In this work, we propose a novel Conformer layer as an alternative approach to scale TK to longer input sequences.
arXiv Detail & Related papers (2021-04-19T15:32:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.