Related papers: ConvFormer: Revisiting Transformer for Sequential User Modeling

ConvFormer: Revisiting Transformer for Sequential User Modeling

URL: http://arxiv.org/abs/2308.02925v1
Date: Sat, 5 Aug 2023 17:33:17 GMT
Title: ConvFormer: Revisiting Transformer for Sequential User Modeling
Authors: Hao Wang, Jianxun Lian, Mingqi Wu, Haoxuan Li, Jiajun Fan, Wanyue Xu, Chaozhuo Li, Xing Xie
Abstract summary: We re-examine Transformer-like architectures aiming to advance state-of-the-art performance. We identify three essential criteria for devising efficient sequential user models. We introduce ConvFormer, a simple but powerful modification to the Transformer architecture that meets these criteria.
Score: 23.650635274170828
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sequential user modeling, a critical task in personalized recommender systems, focuses on predicting the next item a user would prefer, requiring a deep understanding of user behavior sequences. Despite the remarkable success of Transformer-based models across various domains, their full potential in comprehending user behavior remains untapped. In this paper, we re-examine Transformer-like architectures aiming to advance state-of-the-art performance. We start by revisiting the core building blocks of Transformer-based methods, analyzing the effectiveness of the item-to-item mechanism within the context of sequential user modeling. After conducting a thorough experimental analysis, we identify three essential criteria for devising efficient sequential user models, which we hope will serve as practical guidelines to inspire and shape future designs. Following this, we introduce ConvFormer, a simple but powerful modification to the Transformer architecture that meets these criteria, yielding state-of-the-art results. Additionally, we present an acceleration technique to minimize the complexity associated with processing extremely long sequences. Experiments on four public datasets showcase ConvFormer's superiority and confirm the validity of our proposed criteria.

Related papers

Real-Time Personalization with Simple Transformers [5.974778743092437]
We show that simple transformers are capable of capturing complex user preferences. We then develop an algorithm that enables fast optimization of recommendation tasks based on simple transformers. Our algorithm achieves near-optimal performance in sub-linear time.
arXiv Detail & Related papers (2025-03-01T20:29:33Z)
Multifaceted User Modeling in Recommendation: A Federated Foundation Models Approach [28.721903315405353]
Multifaceted user modeling aims to uncover fine-grained patterns and learn representations from user data. Recent studies on foundation model-based recommendation have emphasized the Transformer architecture's remarkable ability to capture complex, non-linear user-item interaction relationships. We propose a novel Transformer layer designed specifically for recommendation, using the self-attention mechanism to capture sequential user-item interaction patterns.
arXiv Detail & Related papers (2024-12-22T11:00:00Z)
Practical token pruning for foundation models in few-shot conversational virtual assistant systems [6.986560111427867]
We pretrain a transformer-based sentence embedding model with a contrastive learning objective and leverage the embedding of the model as features when training intent classification models. Our approach achieves the state-of-the-art results for few-shot scenarios and performs better than other commercial solutions on popular intent classification benchmarks.
arXiv Detail & Related papers (2024-08-21T17:42:17Z)
Large Sequence Models for Sequential Decision-Making: A Survey [33.35835438923926]
The Transformer has attracted increasing interest in the RL communities, spawning numerous approaches with notable effectiveness and generalizability. This paper puts forth various potential avenues for future research intending to improve the effectiveness of large sequence models for sequential decision-making.
arXiv Detail & Related papers (2023-06-24T12:06:26Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications. The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate. There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z)
Federated Privacy-preserving Collaborative Filtering for On-Device Next App Prediction [52.16923290335873]
We propose a novel SeqMF model to solve the problem of predicting the next app launch during mobile device usage. We modify the structure of the classical matrix factorization model and update the training procedure to sequential learning. One more ingredient of the proposed approach is a new privacy mechanism that guarantees the protection of the sent data from the users to the remote server.
arXiv Detail & Related papers (2023-02-05T10:29:57Z)
Exploring and Evaluating Personalized Models for Code Generation [9.25440316608194]
We evaluate transformer model fine-tuning for personalization. We consider three key approaches: (i) custom fine-tuning, which allows all the model parameters to be tuned. We compare these fine-tuning strategies for code generation and discuss the potential generalization and cost benefits of each in various deployment scenarios.
arXiv Detail & Related papers (2022-08-29T23:28:46Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Topic-Controllable Summarization: Topic-Aware Evaluation and Transformer Methods [4.211128681972148]
Topic-controllable summarization is an emerging research area with a wide range of potential applications. This work proposes a new topic-oriented evaluation measure to automatically evaluate the generated summaries. In addition, we adapt topic embeddings to work with powerful Transformer architectures and propose a novel and efficient approach for guiding the summary generation through control tokens.
arXiv Detail & Related papers (2022-06-09T07:28:16Z)
On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets [101.28658250723804]
This paper experiments with augmenting a transformer model with modules that effectively utilize a wider field of view and learn to choose whether the next step requires a navigation or manipulation action. We observe that the proposed modules resulted in improved, and in fact state-of-the-art performance on an unseen validation set of a popular benchmark dataset, ALFRED. We highlight this result as we believe it may be a wider phenomenon in machine learning tasks but primarily noticeable only in benchmarks that limit evaluations on test splits.
arXiv Detail & Related papers (2022-05-18T23:52:21Z)
Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer' With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z)
Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence [29.442579683405913]
The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark. A variant of the TK model -- called TKL -- has been developed that incorporates local self-attention to efficiently process longer input sequences. In this work, we propose a novel Conformer layer as an alternative approach to scale TK to longer input sequences.
arXiv Detail & Related papers (2021-04-19T15:32:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.