Related papers: MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation

MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation

URL: http://arxiv.org/abs/2108.07505v1
Date: Tue, 17 Aug 2021 08:38:49 GMT
Title: MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation
Authors: Hojoon Lee, Dongyoon Hwang, Sunghwan Hong, Changyeon Kim, Seungryong Kim, Jaegul Choo
Abstract summary: Transformer-based models require quadratic memory and time complexity to the sequence length, making it difficult to extract the long-term interest of users. MLP-based models, renowned for their linear memory and time complexity, have recently shown competitive results compared to Transformer in various tasks. We propose the Multi-Order Interaction layer, which is capable of expressing an arbitrary order of interactions while maintaining the memory and time complexity of the layer.
Score: 40.20599070308035
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Successful sequential recommendation systems rely on accurately capturing the user's short-term and long-term interest. Although Transformer-based models achieved state-of-the-art performance in the sequential recommendation task, they generally require quadratic memory and time complexity to the sequence length, making it difficult to extract the long-term interest of users. On the other hand, Multi-Layer Perceptrons (MLP)-based models, renowned for their linear memory and time complexity, have recently shown competitive results compared to Transformer in various tasks. Given the availability of a massive amount of the user's behavior history, the linear memory and time complexity of MLP-based models make them a promising alternative to explore in the sequential recommendation task. To this end, we adopted MLP-based models in sequential recommendation but consistently observed that MLP-based methods obtain lower performance than those of Transformer despite their computational benefits. From experiments, we observed that introducing explicit high-order interactions to MLP layers mitigates such performance gap. In response, we propose the Multi-Order Interaction (MOI) layer, which is capable of expressing an arbitrary order of interactions within the inputs while maintaining the memory and time complexity of the MLP layer. By replacing the MLP layer with the MOI layer, our model was able to achieve comparable performance with Transformer-based models while retaining the MLP-based models' computational benefits.

Related papers

OP-LoRA: The Blessing of Dimensionality [93.08208871549557]
Low-rank adapters enable fine-tuning of large models with only a small number of parameters. They often pose optimization challenges, with poor convergence. We introduce an over- parameterized approach that accelerates training without increasing inference costs. We achieve improvements in vision-language tasks and especially notable increases in image generation.
arXiv Detail & Related papers (2024-12-13T18:55:19Z)
Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting [26.141054975797868]
We propose a novel Adaptive Multi-Scale Decomposition (AMD) framework for time series forecasting (TSF) Our framework decomposes time series into distinct temporal patterns at multiple scales, leveraging the Multi-Scale Decomposable Mixing (MDM) block. Our approach effectively models both temporal and channel dependencies and utilizes autocorrelation to refine multi-scale data integration.
arXiv Detail & Related papers (2024-06-06T05:27:33Z)
BMLP: Behavior-aware MLP for Heterogeneous Sequential Recommendation [16.6816199104481]
We propose a novel multilayer perceptron (MLP)-based heterogeneous sequential recommendation method, namely behavior-aware multilayer perceptron (BMLP) BMLP achieves significant improvement over state-of-the-art algorithms on four public datasets.
arXiv Detail & Related papers (2024-02-20T05:57:01Z)
Attentive Multi-Layer Perceptron for Non-autoregressive Generation [46.14195464583495]
Non-autoregressive(NAR) generation gains increasing popularity for its efficiency and growing efficacy. In this paper, we propose a novel variant, textbfAttentive textbfMulti-textbfLayer textbfPerceptron(AMLP), to produce a generation model with linear time and space complexity.
arXiv Detail & Related papers (2023-10-14T06:44:24Z)
Tuning Pre-trained Model via Moment Probing [62.445281364055795]
We propose a novel Moment Probing (MP) method to explore the potential of LP. MP performs a linear classification head based on the mean of final features. Our MP significantly outperforms LP and is competitive with counterparts at less training cost.
arXiv Detail & Related papers (2023-07-21T04:15:02Z)
AutoMLP: Automated MLP for Sequential Recommendations [20.73096302505791]
Sequential recommender systems aim to predict users' next interested item given their historical interactions. Existing approaches usually set pre-defined short-term interest length by exhaustive search or empirical experience. This paper proposes a novel sequential recommender system, AutoMLP, aiming for better modeling users' long/short-term interests.
arXiv Detail & Related papers (2023-03-11T07:50:49Z)
Efficient Language Modeling with Sparse all-MLP [53.81435968051093]
All-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks. We propose sparse all-MLPs with mixture-of-experts (MoEs) in both feature and input (tokens) We evaluate its zero-shot in-context learning performance on six downstream tasks, and find that it surpasses Transformer-based MoEs and dense Transformers.
arXiv Detail & Related papers (2022-03-14T04:32:19Z)
Bayesian Inference in High-Dimensional Time-Serieswith the Orthogonal Stochastic Linear Mixing Model [2.7909426811685893]
Many modern time-series datasets contain large numbers of output response variables sampled for prolonged periods of time. In this paper, we propose a new Markov chain Monte Carlo framework for the analysis of diverse, large-scale time-series datasets.
arXiv Detail & Related papers (2021-06-25T01:12:54Z)
Learning representations with end-to-end models for improved remaining useful life prognostics [64.80885001058572]
The remaining Useful Life (RUL) of equipment is defined as the duration between the current time and its failure. We propose an end-to-end deep learning model based on multi-layer perceptron and long short-term memory layers (LSTM) to predict the RUL. We will discuss how the proposed end-to-end model is able to achieve such good results and compare it to other deep learning and state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T16:45:18Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.