LLM4Rec: Large Language Models for Multimodal Generative Recommendation with Causal Debiasing
- URL: http://arxiv.org/abs/2510.01622v1
- Date: Thu, 02 Oct 2025 02:53:05 GMT
- Title: LLM4Rec: Large Language Models for Multimodal Generative Recommendation with Causal Debiasing
- Authors: Bo Ma, Hang Li, ZeHua Hu, XiaoFan Gui, LuYao Liu, Simon Lau,
- Abstract summary: This paper introduces an enhanced generative recommendation framework with five key innovations.<n> multimodal fusion architecture, retrieval-augmented generation mechanisms, causal inference-based debiasing, explainable recommendation generation, and real-time adaptive learning capabilities.
- Score: 4.638507244153875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contemporary generative recommendation systems face significant challenges in handling multimodal data, eliminating algorithmic biases, and providing transparent decision-making processes. This paper introduces an enhanced generative recommendation framework that addresses these limitations through five key innovations: multimodal fusion architecture, retrieval-augmented generation mechanisms, causal inference-based debiasing, explainable recommendation generation, and real-time adaptive learning capabilities. Our framework leverages advanced large language models as the backbone while incorporating specialized modules for cross-modal understanding, contextual knowledge integration, bias mitigation, explanation synthesis, and continuous model adaptation. Extensive experiments on three benchmark datasets (MovieLens-25M, Amazon-Electronics, Yelp-2023) demonstrate consistent improvements in recommendation accuracy, fairness, and diversity compared to existing approaches. The proposed framework achieves up to 2.3% improvement in NDCG@10 and 1.4% enhancement in diversity metrics while maintaining computational efficiency through optimized inference strategies.
Related papers
- ERNIE 5.0 Technical Report [244.36480708815316]
ERNIE 5.0 is a unified autoregressive foundation model for unified multimodal understanding and generation across text, image, video, and audio.<n>To address practical challenges in large-scale deployment under diverse resource constraints, ERNIE 5.0 adopts a novel elastic training paradigm.<n>We show that ERNIE 5.0 achieves strong and balanced performance across multiple modalities.
arXiv Detail & Related papers (2026-02-04T16:18:15Z) - Multi-Aspect Cross-modal Quantization for Generative Recommendation [27.92632297542123]
We propose Multi-Aspect Cross-modal quantization for generative Recommendation (MACRec)<n>We first introduce cross-modal quantization during the ID learning process, which effectively reduces conflict rates.<n>We also incorporate multi-aspect cross-modal alignments, including the implicit and explicit alignments.
arXiv Detail & Related papers (2025-11-19T04:55:14Z) - UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings [70.60608084375691]
We pioneer the exploration of generative embeddings, unifying embedding tasks within a generative paradigm.<n>We propose UME-R1, a universal multimodal embedding framework consisting of a two-stage training strategy.<n> evaluated on the MMEB-V2 benchmark across 78 tasks spanning video, image, and visual documents.
arXiv Detail & Related papers (2025-11-01T05:04:23Z) - RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1 [20.92548890511589]
This paper introduces RecLLM-R1, a novel recommendation framework leveraging Large Language Models (LLMs)<n> RecLLM-R1 significantly surpasses existing baseline methods across a spectrum of evaluation metrics, including accuracy, diversity, and novelty.
arXiv Detail & Related papers (2025-06-24T01:39:34Z) - REMoH: A Reflective Evolution of Multi-objective Heuristics approach via Large Language Models [39.85828629779943]
Multi-objective optimization is fundamental in complex decision-making tasks.<n>Recent advances in Large Language Models (LLMs) offer enhanced explainability, adaptability, and reasoning.<n>This work proposes Reflective Evolution of Multi-objective Heuristics (REMoH), a novel framework integrating NSGA-II with LLM-based generation.
arXiv Detail & Related papers (2025-06-09T13:38:28Z) - Bidirectional Knowledge Distillation for Enhancing Sequential Recommendation with Large Language Models [28.559223475725137]
Large language models (LLMs) have demonstrated exceptional performance in understanding and generating semantic patterns.<n>LLMs often face challenges related to high inference costs and static knowledge transfer methods.<n>We propose a novel mutual distillation framework, LLMD4Rec, that fosters dynamic and bidirectional knowledge exchange.
arXiv Detail & Related papers (2025-05-23T17:21:14Z) - Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models [83.8639566087953]
We propose a direct retrieval-augmented optimization framework, named DRO, that enables end-to-end training of two key components.<n>DRO alternates between two phases: (i) document permutation estimation and (ii) re-weighted, progressively improving RAG components.<n>Our theoretical analysis reveals that DRO is analogous to policy-gradient methods in reinforcement learning.
arXiv Detail & Related papers (2025-05-05T23:54:53Z) - MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct [148.39859547619156]
We propose MMEvol, a novel multimodal instruction data evolution framework.<n>MMEvol iteratively improves data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution.<n>Our approach reaches state-of-the-art (SOTA) performance in nine tasks using significantly less data compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-09T17:44:00Z) - MMREC: LLM Based Multi-Modal Recommender System [2.3113916776957635]
This paper presents a novel approach to enhancing recommender systems by leveraging Large Language Models (LLMs) and deep learning techniques.<n>The proposed framework aims to improve the accuracy and relevance of recommendations by incorporating multi-modal information processing and by the use of unified latent space representation.
arXiv Detail & Related papers (2024-08-08T04:31:29Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Optimization-Inspired Learning with Architecture Augmentations and
Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles.
We construct three propagative modules to effectively solve the optimization models with flexible combinations.
Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.