LMMRec: LLM-driven Motivation-aware Multimodal Recommendation
- URL: http://arxiv.org/abs/2602.05474v2
- Date: Tue, 10 Feb 2026 03:12:44 GMT
- Title: LMMRec: LLM-driven Motivation-aware Multimodal Recommendation
- Authors: Yicheng Di, Zhanjie Zhang, Yun Wang, Jinren Liu, Jiaqi Yan, Jiyu Wei, Xiangyu Chen, Yuan Liu,
- Abstract summary: Motivation-based recommendation systems uncover user behavior drivers.<n> Motivation modeling, crucial for decision-making and content preference, explains recommendation generation.<n>Existing methods often treat motivation as latent variables from interaction data, neglecting heterogeneous information like review text.<n>We propose Motivation-aware Multimodal Recommendation (LMMRec), a model-agnostic framework leveraging large language models for deep semantic priors and motivation understanding.
- Score: 16.86296664489242
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivation-based recommendation systems uncover user behavior drivers. Motivation modeling, crucial for decision-making and content preference, explains recommendation generation. Existing methods often treat motivation as latent variables from interaction data, neglecting heterogeneous information like review text. In multimodal motivation fusion, two challenges arise: 1) achieving stable cross-modal alignment amid noise, and 2) identifying features reflecting the same underlying motivation across modalities. To address these, we propose LLM-driven Motivation-aware Multimodal Recommendation (LMMRec), a model-agnostic framework leveraging large language models for deep semantic priors and motivation understanding. LMMRec uses chain-of-thought prompting to extract fine-grained user and item motivations from text. A dual-encoder architecture models textual and interaction-based motivations for cross-modal alignment, while Motivation Coordination Strategy and Interaction-Text Correspondence Method mitigate noise and semantic drift through contrastive learning and momentum updates. Experiments on three datasets show LMMRec achieves up to a 4.98\% performance improvement.
Related papers
- DMESR: Dual-view MLLM-based Enhancing Framework for Multimodal Sequential Recommendation [13.114773060703891]
We propose a Dual-view MLLM-based Enhancing framework for multimodal Sequential Recommendation (DMESR)<n>For the misalignment issue, we employ a contrastive learning mechanism to align the cross-modal semantic representations generated by MLLMs.<n>For the loss of fine-grained semantics, we introduce a cross-attention fusion module that integrates the coarse-grained semantic knowledge obtained from MLLMs with the fine-grained original textual semantics.
arXiv Detail & Related papers (2026-02-14T10:42:56Z) - Beyond Unimodal Shortcuts: MLLMs as Cross-Modal Reasoners for Grounded Named Entity Recognition [51.68340973140949]
Multimodal Named Entity Recognition (GMNER) aims to extract text-based entities, assign them semantic categories, and ground them to corresponding visual regions.<n> MLLMs exhibit $textbfmodality bias$, including visual bias and textual bias, which stems from their tendency to take unimodal shortcuts.<n>We propose Modality-aware Consistency Reasoning ($bfMCR$), which enforces structured cross-modal reasoning.
arXiv Detail & Related papers (2026-02-04T12:12:49Z) - From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation [59.27094165576015]
We propose a novel learning paradigm (UniMod) that transitions from sparse decision-making to dense reasoning traces.<n>By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, we reformulate monolithic decision tasks into a multi-dimensional boundary learning process.<n>We introduce specialized optimization strategies to decouple task-specific parameters and rebalance training dynamics, effectively resolving interference between diverse objectives in multi-task learning.
arXiv Detail & Related papers (2026-01-28T09:29:40Z) - Progressive Semantic Residual Quantization for Multimodal-Joint Interest Modeling in Music Recommendation [6.790539226766362]
We propose a novel multimodal recommendation framework with two stages.<n>In the first stage, our method generates modal-specific and modal-joint semantic IDs.<n>In the second stage, to model multimodal interest of users, a Multi-Codebook Cross-Attention network is designed.
arXiv Detail & Related papers (2025-08-28T02:16:57Z) - M-$LLM^3$REC: A Motivation-Aware User-Item Interaction Framework for Enhancing Recommendation Accuracy with LLMs [0.5735035463793009]
This paper proposes a novel recommendation framework, termed M-$LLM3$REC.<n>By emphasizing motivation-driven semantic modeling, M-$LLM3$REC demonstrates robust, personalized, and generalizable recommendations.
arXiv Detail & Related papers (2025-08-21T05:50:13Z) - MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings [75.0617088717528]
MoCa is a framework for transforming pre-trained VLM backbones into effective bidirectional embedding models.<n>MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results.
arXiv Detail & Related papers (2025-06-29T06:41:00Z) - OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging [124.91183814854126]
Model merging seeks to combine multiple expert models into a single model.<n>We introduce a benchmark for model merging research that clearly divides the tasks for MLLM training and evaluation.<n>We find that model merging offers a promising way for building improved MLLMs without requiring training data.
arXiv Detail & Related papers (2025-05-26T12:23:14Z) - BBQRec: Behavior-Bind Quantization for Multi-Modal Sequential Recommendation [15.818669767036592]
We propose a Behavior-Bind multi-modal Quantization for Sequential Recommendation (BBQRec) featuring dual-aligned quantization and semantics-aware sequence modeling.<n>BBQRec disentangles modality-agnostic behavioral patterns from noisy modality-specific features through contrastive codebook learning.<n>We design a discretized similarity reweighting mechanism that dynamically adjusts self-attention scores using quantized semantic relationships.
arXiv Detail & Related papers (2025-04-09T07:19:48Z) - SDRT: Enhance Vision-Language Models by Self-Distillation with Diverse Reasoning Traces [11.462550020102935]
We propose a novel self-distillation framework for Vision-Language Models.<n>We employ a prompt library tailored to visual reasoning tasks to generate diverse in-context questions.<n>We then utilize a two-step reasoning procedure to derive reasoning-guided responses.<n>These responses are then used for self-distillation, enabling the model to internalize the reasoning process.
arXiv Detail & Related papers (2025-03-03T17:24:42Z) - LLM-based Bi-level Multi-interest Learning Framework for Sequential Recommendation [54.396000434574454]
We propose a novel multi-interest SR framework combining implicit behavioral and explicit semantic perspectives.<n>It includes two modules: the Implicit Behavioral Interest Module and the Explicit Semantic Interest Module.<n>Experiments on four real-world datasets validate the framework's effectiveness and practicality.
arXiv Detail & Related papers (2024-11-14T13:00:23Z) - Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning [79.38140606606126]
We propose an algorithmic framework that fine-tunes vision-language models (VLMs) with reinforcement learning (RL)
Our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning.
We demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks.
arXiv Detail & Related papers (2024-05-16T17:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.