MARS: Modality-Aligned Retrieval for Sequence Augmented CTR Prediction
- URL: http://arxiv.org/abs/2509.01184v1
- Date: Mon, 01 Sep 2025 07:08:44 GMT
- Title: MARS: Modality-Aligned Retrieval for Sequence Augmented CTR Prediction
- Authors: Yutian Xiao, Shukuan Wang, Binhao Wang, Zhao Zhang, Yanze Zhang, Shanqi Liu, Chao Feng, Xiang Li, Fuzhen Zhuang,
- Abstract summary: We propose a novel framework textbfMARS (textbfModality-textbfAligned textbfRetrieval for textbfSequence Augmented CTR Prediction)<n>MARS utilizes a Stein kernel-based approach to align text and image features into a unified and unbiased semantic space to construct multimodal user embeddings.<n>It consistently outperforms state-of-the-art baselines and substantial growth on core business metrics.
- Score: 23.789479369353675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Click-through rate (CTR) prediction serves as a cornerstone of recommender systems. Despite the strong performance of current CTR models based on user behavior modeling, they are still severely limited by interaction sparsity, especially in low-active user scenarios. To address this issue, data augmentation of user behavior is a promising research direction. However, existing data augmentation methods heavily rely on collaborative signals while overlooking the rich multimodal features of items, leading to insufficient modeling of low-active users. To alleviate this problem, we propose a novel framework \textbf{MARS} (\textbf{M}odality-\textbf{A}ligned \textbf{R}etrieval for \textbf{S}equence Augmented CTR Prediction). MARS utilizes a Stein kernel-based approach to align text and image features into a unified and unbiased semantic space to construct multimodal user embeddings. Subsequently, each low-active user's behavior sequence is augmented by retrieving, filtering, and concentrating the most similar behavior sequence of high-active users via multimodal user embeddings. Validated by extensive offline experiments and online A/B tests, our framework MARS consistently outperforms state-of-the-art baselines and achieves substantial growth on core business metrics within Kuaishou~\footnote{https://www.kuaishou.com/}. Consequently, MARS has been successfully deployed, serving the main traffic for hundreds of millions of users. To ensure reproducibility, we provide anonymous access to the implementation code~\footnote{https://github.com/wangshukuan/MARS}.
Related papers
- GenCI: Generative Modeling of User Interest Shift via Cohort-based Intent Learning for CTR Prediction [84.0125708499372]
We propose a generative user intent framework to model user preferences for click-through rate (CTR) prediction.<n>The framework first employs a generative model, trained with a next-item prediction objective, to proactively produce candidate interest cohorts.<n>A hierarchical candidate-aware network then injects this rich contextual signal into the ranking stage, refining them with cross-attention to align with both user history and the target item.
arXiv Detail & Related papers (2026-01-26T08:15:04Z) - Unleashing the Potential of Sparse Attention on Long-term Behaviors for CTR Prediction [17.78352301235849]
We propose SparseCTR, an efficient and effective model specifically designed for long-term behaviors of users.<n>Based on these chunks, we propose a three-branch sparse self-attention mechanism to jointly identify users' global interests.<n>We show that SparseCTR not only improves efficiency but also outperforms state-of-the-art methods.
arXiv Detail & Related papers (2026-01-25T13:39:26Z) - From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models [81.43473418572567]
Click-Through Rate (CTR) prediction is a core task in recommendation systems.<n>We propose a novel generative framework to address embedding dimensional collapse and information redundancy.<n>We show that SFG consistently mitigates embedding collapse and reduces information redundancy, while yielding substantial performance gains.
arXiv Detail & Related papers (2025-12-16T03:17:18Z) - Multi-granularity Interest Retrieval and Refinement Network for Long-Term User Behavior Modeling in CTR Prediction [68.90783662117936]
Click-through Rate (CTR) prediction is crucial for online personalization platforms.<n>Recent advancements have shown that modeling rich user behaviors can significantly improve the performance of CTR prediction.<n>We propose Multi-granularity Interest Retrieval and Refinement Network (MIRRN)
arXiv Detail & Related papers (2024-11-22T15:29:05Z) - Multi-Level Sequence Denoising with Cross-Signal Contrastive Learning for Sequential Recommendation [13.355017204983973]
Sequential recommender systems (SRSs) aim to suggest next item for a user based on her historical interaction sequences.
We propose a novel model named Multi-level Sequence Denoising with Cross-signal Contrastive Learning (MSDCCL) for sequential recommendation.
arXiv Detail & Related papers (2024-04-22T04:57:33Z) - Impression-Informed Multi-Behavior Recommender System: A Hierarchical
Graph Attention Approach [4.03161352925235]
We introduce textbfHierarchical textbfMulti-behavior textbfGraph Attention textbfNetwork (HMGN)
This pioneering framework leverages attention mechanisms to discern information from both inter and intra-behaviors.
We register a notable performance boost of up to 64% in NDCG@100 metrics over conventional graph neural network methods.
arXiv Detail & Related papers (2023-09-06T17:09:43Z) - TBIN: Modeling Long Textual Behavior Data for CTR Prediction [15.056265935931377]
Click-through rate (CTR) prediction plays a pivotal role in the success of recommendations.
Inspired by the recent thriving of language models (LMs), a surge of works improve prediction by organizing user behavior data in a textbftextual format.
While promising, these works have to truncate the textual data to reduce the quadratic computational overhead of self-attention in LMs.
In this paper, we propose a textbfTextual textbfBehavior-based textbfInterest Chunking textbfN
arXiv Detail & Related papers (2023-08-09T03:48:41Z) - Sampling Is All You Need on Modeling Long-Term User Behaviors for CTR
Prediction [15.97120392599086]
We propose textbfM (textbfSampling-based textbfDeep textbfModeling), a simple yet effective sampling-based end-to-end approach for modeling long-term user behaviors.
We show theoretically and experimentally that the proposed method performs on par with standard attention-based models on modeling long-term user behaviors.
arXiv Detail & Related papers (2022-05-20T15:20:52Z) - Hyper Meta-Path Contrastive Learning for Multi-Behavior Recommendation [61.114580368455236]
User purchasing prediction with multi-behavior information remains a challenging problem for current recommendation systems.
We propose the concept of hyper meta-path to construct hyper meta-paths or hyper meta-graphs to explicitly illustrate the dependencies among different behaviors of a user.
Thanks to the recent success of graph contrastive learning, we leverage it to learn embeddings of user behavior patterns adaptively instead of assigning a fixed scheme to understand the dependencies among different behaviors.
arXiv Detail & Related papers (2021-09-07T04:28:09Z) - Contrastive Self-supervised Sequential Recommendation with Robust
Augmentation [101.25762166231904]
Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data.
Old and new issues remain, including data-sparsity and noisy data.
We propose Contrastive Self-Supervised Learning for sequential Recommendation (CoSeRec)
arXiv Detail & Related papers (2021-08-14T07:15:25Z) - Multi-Interactive Attention Network for Fine-grained Feature Learning in
CTR Prediction [48.267995749975476]
In the Click-Through Rate (CTR) prediction scenario, user's sequential behaviors are well utilized to capture the user interest.
Existing methods mostly utilize attention on the behavior of users, which is not always suitable for CTR prediction.
We propose a Multi-Interactive Attention Network (MIAN) to comprehensively extract the latent relationship among all kinds of fine-grained features.
arXiv Detail & Related papers (2020-12-13T05:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.