Related papers: DMGIN: How Multimodal LLMs Enhance Large Recommendation Models for Lifelong User Post-click Behaviors

DMGIN: How Multimodal LLMs Enhance Large Recommendation Models for Lifelong User Post-click Behaviors

URL: http://arxiv.org/abs/2508.21801v1
Date: Fri, 29 Aug 2025 17:28:07 GMT
Title: DMGIN: How Multimodal LLMs Enhance Large Recommendation Models for Lifelong User Post-click Behaviors
Authors: Zhuoxing Wei, Qingchen Xie, Qi Liu,
Abstract summary: Long post-click behavior sequences pose severe performance issues.<n>Deep Multimodal Group Interest Network (DMGIN) improves Click-Through Rate (CTR) prediction.
Score: 5.465812199325145
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modeling user interest based on lifelong user behavior sequences is crucial for enhancing Click-Through Rate (CTR) prediction. However, long post-click behavior sequences themselves pose severe performance issues: the sheer volume of data leads to high computational costs and inefficiencies in model training and inference. Traditional methods address this by introducing two-stage approaches, but this compromises model effectiveness due to incomplete utilization of the full sequence context. More importantly, integrating multimodal embeddings into existing large recommendation models (LRM) presents significant challenges: These embeddings often exacerbate computational burdens and mismatch with LRM architectures. To address these issues and enhance the model's efficiency and accuracy, we introduce Deep Multimodal Group Interest Network (DMGIN). Given the observation that user post-click behavior sequences contain a large number of repeated items with varying behaviors and timestamps, DMGIN employs Multimodal LLMs(MLLM) for grouping to reorganize complete lifelong post-click behavior sequences more effectively, with almost no additional computational overhead, as opposed to directly introducing multimodal embeddings. To mitigate the potential information loss from grouping, we have implemented two key strategies. First, we analyze behaviors within each group using both interest statistics and intra-group transformers to capture group traits. Second, apply inter-group transformers to temporally ordered groups to capture the evolution of user group interests. Our extensive experiments on both industrial and public datasets confirm the effectiveness and efficiency of DMGIN. The A/B test in our LBS advertising system shows that DMGIN improves CTR by 4.7% and Revenue per Mile by 2.3%.

Related papers

GEMs: Breaking the Long-Sequence Barrier in Generative Recommendation with a Multi-Stream Decoder [54.64137490632567]
We propose a novel and unified framework designed to capture users' sequences from long-term history.<n>Generative Multi-streamers ( GEMs) break user sequences into three streams.<n>Extensive experiments on large-scale industrial datasets demonstrate that GEMs significantly outperforms state-the-art methods in recommendation accuracy.
arXiv Detail & Related papers (2026-02-14T06:42:56Z)
PRISM: Performer RS-IMLE for Single-pass Multisensory Imitation Learning [51.24484551729328]
We introduce PRISM, a single-pass policy based on a batch-global rejection-sampling variant of IMLE.<n> PRISM couples a temporal multisensory encoder with a linear-attention generator using a Performer architecture.<n>We demonstrate the efficacy of PRISM on a diverse real-world hardware suite, including loco-manipulation using a Unitree Go2 with a 7-DoF arm D1 and tabletop manipulation with a UR5 manipulator.
arXiv Detail & Related papers (2026-02-02T17:57:37Z)
No One Left Behind: How to Exploit the Incomplete and Skewed Multi-Label Data for Conversion Rate Prediction [48.578518946398354]
In most real-world online advertising systems, advertisers typically have diverse customer acquisition goals.<n>A common solution is to use multi-task learning to train a unified model on post-click data to estimate the conversion rate (CVR) for diverse targets.<n>In practice, CVR prediction often encounters missing conversion data as many advertisers submit only a subset of user conversion actions due to privacy or other constraints.
arXiv Detail & Related papers (2025-12-15T13:14:20Z)
Structurally Refined Graph Transformer for Multimodal Recommendation [13.296555757708298]
We present SRGFormer, a structurally optimized multimodal recommendation model.<n>By modifying the transformer for better integration into our model, we capture the overall behavior patterns of users.<n>Then, we enhance structural information by embedding multimodal information into a hypergraph structure to aid in learning the local structures between users and items.
arXiv Detail & Related papers (2025-11-01T15:18:00Z)
Empowering Large Language Model for Sequential Recommendation via Multimodal Embeddings and Semantic IDs [28.752042722391934]
Sequential recommendation (SR) aims to capture users' dynamic interests and sequential patterns based on their historical interactions.<n>MME-SID integrates multimodal embeddings and quantized embeddings to mitigate embedding collapse.<n>Extensive experiments on three public datasets validate the superior performance of MME-SID.
arXiv Detail & Related papers (2025-09-02T07:02:29Z)
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping [28.653290360671175]
We introduce MINT, a simple yet surprisingly effective task-grouping strategy based on the type of multimodal interaction.<n>We demonstrate that the proposed method greatly outperforms existing task grouping baselines for multimodal instruction tuning.
arXiv Detail & Related papers (2025-06-02T22:55:23Z)
More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives [51.497338578427915]
Large language models (LLMs) excel at few-shot in-context learning (ICL) without requiring parameter updates.<n>DrICL is a novel optimization method that enhances model performance through textitDifferentiated and textitReweighting objectives.<n>We develop the textitMany-Shot ICL Benchmark (ICL-50)-a large-scale benchmark of 50 tasks that cover shot numbers from 1 to 350 within sequences of up to 8,000 tokens.
arXiv Detail & Related papers (2025-01-07T14:57:08Z)
Multi-granularity Interest Retrieval and Refinement Network for Long-Term User Behavior Modeling in CTR Prediction [68.90783662117936]
Click-through Rate (CTR) prediction is crucial for online personalization platforms.<n>Recent advancements have shown that modeling rich user behaviors can significantly improve the performance of CTR prediction.<n>We propose Multi-granularity Interest Retrieval and Refinement Network (MIRRN)
arXiv Detail & Related papers (2024-11-22T15:29:05Z)
LLM-based Bi-level Multi-interest Learning Framework for Sequential Recommendation [54.396000434574454]
We propose a novel multi-interest SR framework combining implicit behavioral and explicit semantic perspectives.<n>It includes two modules: the Implicit Behavioral Interest Module and the Explicit Semantic Interest Module.<n>Experiments on four real-world datasets validate the framework's effectiveness and practicality.
arXiv Detail & Related papers (2024-11-14T13:00:23Z)
Improved Diversity-Promoting Collaborative Metric Learning for Recommendation [127.08043409083687]
Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems. This paper focuses on a challenging scenario where a user has multiple categories of interests. We propose a novel method called textitDiversity-Promoting Collaborative Metric Learning (DPCML)
arXiv Detail & Related papers (2024-09-02T07:44:48Z)
TWIN V2: Scaling Ultra-Long User Behavior Sequence Modeling for Enhanced CTR Prediction at Kuaishou [28.809014888174932]
We introduce TWIN-V2, an enhancement of SIM, where a divide-and-conquer approach is applied to compress life-cycle behaviors and uncover more accurate and diverse user interests. Under an efficient deployment framework, TWIN-V2 has been successfully deployed to the primary traffic that serves hundreds of millions of daily active users at Kuaishou.
arXiv Detail & Related papers (2024-07-23T10:00:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.