HistLLM: A Unified Framework for LLM-Based Multimodal Recommendation with User History Encoding and Compression
- URL: http://arxiv.org/abs/2504.10150v2
- Date: Tue, 22 Apr 2025 02:41:47 GMT
- Title: HistLLM: A Unified Framework for LLM-Based Multimodal Recommendation with User History Encoding and Compression
- Authors: Chen Zhang, Bo Hu, Weidong Chen, Zhendong Mao,
- Abstract summary: HistLLM is an innovative framework that integrates textual and visual features through a User History.<n> Module (UHEM), compressing user history interactions into a single token representation.<n>Extensive experiments demonstrate the effectiveness and efficiency of our proposed mechanism.
- Score: 33.34435467588446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large language models (LLMs) have proven effective in leveraging textual data for recommendations, their application to multimodal recommendation tasks remains relatively underexplored. Although LLMs can process multimodal information through projection functions that map visual features into their semantic space, recommendation tasks often require representing users' history interactions through lengthy prompts combining text and visual elements, which not only hampers training and inference efficiency but also makes it difficult for the model to accurately capture user preferences from complex and extended prompts, leading to reduced recommendation performance. To address this challenge, we introduce HistLLM, an innovative multimodal recommendation framework that integrates textual and visual features through a User History Encoding Module (UHEM), compressing multimodal user history interactions into a single token representation, effectively facilitating LLMs in processing user preferences. Extensive experiments demonstrate the effectiveness and efficiency of our proposed mechanism.
Related papers
- Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation [4.518104756199573]
Molar is a sequential recommendation framework that integrates multiple content modalities with ID information to capture collaborative signals effectively.
By seamlessly combining multimodal content with collaborative filtering insights, Molar captures both user interests and contextual semantics, leading to superior recommendation accuracy.
arXiv Detail & Related papers (2024-12-24T05:23:13Z) - Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information [76.62949982303532]
We propose a parameter-efficient Large Language Model Bi-Tuning framework for sequential recommendation with collaborative information (Laser)
In our Laser, the prefix is utilized to incorporate user-item collaborative information and adapt the LLM to the recommendation task, while the suffix converts the output embeddings of the LLM from the language space to the recommendation space for the follow-up item recommendation.
M-Former is a lightweight MoE-based querying transformer that uses a set of query experts to integrate diverse user-specific collaborative information encoded by frozen ID-based sequential recommender systems.
arXiv Detail & Related papers (2024-09-03T04:55:03Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation [21.281471662696372]
We propose the Multimodal Large Language Model-enhanced Multimodaln Sequential Recommendation (MLLM-MSR) model.<n>To capture the dynamic user preference, we design a two-stage user preference summarization method.<n>We then employ a recurrent user preference summarization generation paradigm to capture the dynamic changes in user preferences.
arXiv Detail & Related papers (2024-08-19T04:44:32Z) - MMREC: LLM Based Multi-Modal Recommender System [2.3113916776957635]
This paper presents a novel approach to enhancing recommender systems by leveraging Large Language Models (LLMs) and deep learning techniques.
The proposed framework aims to improve the accuracy and relevance of recommendations by incorporating multi-modal information processing and by the use of unified latent space representation.
arXiv Detail & Related papers (2024-08-08T04:31:29Z) - Fine-tuning Multimodal Large Language Models for Product Bundling [53.01642741096356]
We introduce Bundle-MLLM, a novel framework that fine-tunes large language models (LLMs) through a hybrid item tokenization approach.<n>Specifically, we integrate textual, media, and relational data into a unified tokenization, introducing a soft separation token to distinguish between textual and non-textual tokens.<n>We propose a progressive optimization strategy that fine-tunes LLMs for disentangled objectives: 1) learning bundle patterns and 2) enhancing multimodal semantic understanding specific to product bundling.
arXiv Detail & Related papers (2024-07-16T13:30:14Z) - NoteLLM-2: Multimodal Large Representation Models for Recommendation [71.87790090964734]
Large Language Models (LLMs) have demonstrated exceptional proficiency in text understanding and embedding tasks.<n>Their potential in multimodal representation, particularly for item-to-item (I2I) recommendations, remains underexplored.<n>We propose an end-to-end fine-tuning method that customizes the integration of any existing LLMs and vision encoders for efficient multimodal representation.
arXiv Detail & Related papers (2024-05-27T03:24:01Z) - InteraRec: Screenshot Based Recommendations Using Multimodal Large Language Models [0.6926105253992517]
We introduce a sophisticated and interactive recommendation framework denoted as InteraRec.
InteraRec captures high-frequency screenshots of web pages as users navigate through a website.
We demonstrate the effectiveness of InteraRec in providing users with valuable and personalized offerings.
arXiv Detail & Related papers (2024-02-26T17:47:57Z) - DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval.
Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP.
To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z) - CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation [60.2700801392527]
We introduce CoLLM, an innovative LLMRec methodology that seamlessly incorporates collaborative information into LLMs for recommendation.
CoLLM captures collaborative information through an external traditional model and maps it to the input token embedding space of LLM.
Extensive experiments validate that CoLLM adeptly integrates collaborative information into LLMs, resulting in enhanced recommendation performance.
arXiv Detail & Related papers (2023-10-30T12:25:00Z) - Recommender AI Agent: Integrating Large Language Models for Interactive
Recommendations [53.76682562935373]
We introduce an efficient framework called textbfInteRecAgent, which employs LLMs as the brain and recommender models as tools.
InteRecAgent achieves satisfying performance as a conversational recommender system, outperforming general-purpose LLMs.
arXiv Detail & Related papers (2023-08-31T07:36:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.