Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation
- URL: http://arxiv.org/abs/2409.16627v2
- Date: Wed, 02 Oct 2024 15:57:50 GMT
- Title: Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation
- Authors: Yueqi Wang, Zhenrui Yue, Huimin Zeng, Dong Wang, Julian McAuley,
- Abstract summary: We introduce a lightweight framework called full-scale Matryoshka representation learning for multimodal recommendation (fMRLRec)
Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions.
We demonstrate the effectiveness and efficiency of fMRLRec on multiple benchmark datasets.
- Score: 27.243116376164906
- License:
- Abstract: Despite recent advancements in language and vision modeling, integrating rich multimodal knowledge into recommender systems continues to pose significant challenges. This is primarily due to the need for efficient recommendation, which requires adaptive and interactive responses. In this study, we focus on sequential recommendation and introduce a lightweight framework called full-scale Matryoshka representation learning for multimodal recommendation (fMRLRec). Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions. To integrate item features from diverse modalities, fMRLRec employs a simple mapping to project multimodal item features into an aligned feature space. Additionally, we design an efficient linear transformation that embeds smaller features into larger ones, substantially reducing memory requirements for large-scale training on recommendation data. Combined with improved state space modeling techniques, fMRLRec scales to different dimensions and only requires one-time training to produce multiple models tailored to various granularities. We demonstrate the effectiveness and efficiency of fMRLRec on multiple benchmark datasets, which consistently achieves superior performance over state-of-the-art baseline methods. We make our code and data publicly available at https://github.com/yueqirex/fMRLRec.
Related papers
- EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models.
We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness.
Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z) - An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Large-Scale Recommender Systems [12.277443583840963]
We propose a novel method called Enhanced-State RL for Multi-Task Fusion (MTF) in Recommender Systems (RSs)
Our method first defines user features, item features, and other valuable features collectively as the enhanced state; then proposes a novel actor and critic learning process to utilize the enhanced state to make much better action for each user-item pair.
arXiv Detail & Related papers (2024-09-18T03:34:31Z) - Personalized Multi-task Training for Recommender System [80.23030752707916]
PMTRec is the first personalized multi-task learning algorithm to obtain comprehensive user/item embeddings from various information sources.
Our contributions open new avenues for advancing personalized multi-task training in recommender systems.
arXiv Detail & Related papers (2024-07-31T06:27:06Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - AlignRec: Aligning and Training in Multimodal Recommendations [29.995007279325947]
multimodal recommendations can leverage rich contexts beyond interactions.
Existing methods mainly regard multimodal information as an auxiliary, using them to help learn ID features.
There exist semantic gaps among multimodal content features and ID-based features, for which directly using multimodal information as an auxiliary would lead to misalignment in representations of users and items.
arXiv Detail & Related papers (2024-03-19T02:49:32Z) - Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - Parameter Efficient Multi-task Model Fusion with Partial Linearization [97.23530944186078]
We propose a novel method to improve multi-task fusion for parameter-efficient fine-tuning techniques.
Our approach partially linearizes only the adapter modules and applies task arithmetic over the linearized adapters.
We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model.
arXiv Detail & Related papers (2023-10-07T08:55:54Z) - Efficient Multimodal Fusion via Interactive Prompting [62.08292938484994]
Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era.
We propose an efficient and flexible multimodal fusion method, namely PMF, tailored for fusing unimodally pre-trained transformers.
arXiv Detail & Related papers (2023-04-13T07:31:51Z) - M2TRec: Metadata-aware Multi-task Transformer for Large-scale and
Cold-start free Session-based Recommendations [9.327321259021236]
Session-based recommender systems (SBRSs) have shown superior performance over conventional methods.
We propose M2TRec, a Metadata-aware Multi-task Transformer model for session-based recommendations.
arXiv Detail & Related papers (2022-09-23T19:34:29Z) - Towards Universal Sequence Representation Learning for Recommender
Systems [98.02154164251846]
We present a novel universal sequence representation learning approach, named UniSRec.
The proposed approach utilizes the associated description text of items to learn transferable representations across different recommendation scenarios.
Our approach can be effectively transferred to new recommendation domains or platforms in a parameter-efficient way.
arXiv Detail & Related papers (2022-06-13T07:21:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.