Quadratic Interest Network for Multimodal Click-Through Rate Prediction
- URL: http://arxiv.org/abs/2504.17699v2
- Date: Fri, 25 Apr 2025 05:02:28 GMT
- Title: Quadratic Interest Network for Multimodal Click-Through Rate Prediction
- Authors: Honghao Li, Hanwei Li, Jing Zhang, Yi Zhang, Ziniu Yu, Lei Sang, Yiwen Zhang,
- Abstract summary: Multimodal click-through rate (CTR) prediction is a key technique in industrial recommender systems.<n>We propose a novel model for Task 2, named Quadratic Interest Network (QIN) for Multimodal CTR Prediction.
- Score: 12.989347150912685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal click-through rate (CTR) prediction is a key technique in industrial recommender systems. It leverages heterogeneous modalities such as text, images, and behavioral logs to capture high-order feature interactions between users and items, thereby enhancing the system's understanding of user interests and its ability to predict click behavior. The primary challenge in this field lies in effectively utilizing the rich semantic information from multiple modalities while satisfying the low-latency requirements of online inference in real-world applications. To foster progress in this area, the Multimodal CTR Prediction Challenge Track of the WWW 2025 EReL@MIR Workshop formulates the problem into two tasks: (1) Task 1 of Multimodal Item Embedding: this task aims to explore multimodal information extraction and item representation learning methods that enhance recommendation tasks; and (2) Task 2 of Multimodal CTR Prediction: this task aims to explore what multimodal recommendation model can effectively leverage multimodal embedding features and achieve better performance. In this paper, we propose a novel model for Task 2, named Quadratic Interest Network (QIN) for Multimodal CTR Prediction. Specifically, QIN employs adaptive sparse target attention to extract multimodal user behavior features, and leverages Quadratic Neural Networks to capture high-order feature interactions. As a result, QIN achieved an AUC of 0.9798 on the leaderboard and ranked second in the competition. The model code, training logs, hyperparameter configurations, and checkpoints are available at https://github.com/salmon1802/QIN.
Related papers
- Parameter Aware Mamba Model for Multi-task Dense Prediction [69.94454603308196]
We introduce a novel decoder-based framework, Aware Mamba Model (PAMM), specifically designed for dense prediction in multi-task learning setting.<n>It features dual state space parameter experts that integrate and set task-specific parameter priors, capturing the intrinsic properties of each task.<n>We employ the Multi-Directional Hilbert Scanning method to construct multi-angle feature sequences, thereby enhancing the sequence model's perceptual capabilities for 2D data.
arXiv Detail & Related papers (2025-11-18T13:48:00Z) - Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning [57.082554323521464]
We propose a Sensitivity-aware Task Vector insertion framework (STV) to figure out where and what to insert.<n>Our key insight is that activation deltas across query-context pairs exhibit consistent structural patterns, providing a reliable cue for insertion.<n>Based on the identified sensitive-aware locations, we construct a pre-clustered activation bank for each location by clustering the activation values, and then apply reinforcement learning to choose the most suitable one to insert.
arXiv Detail & Related papers (2025-11-11T13:42:13Z) - DMGIN: How Multimodal LLMs Enhance Large Recommendation Models for Lifelong User Post-click Behaviors [5.465812199325145]
Long post-click behavior sequences pose severe performance issues.<n>Deep Multimodal Group Interest Network (DMGIN) improves Click-Through Rate (CTR) prediction.
arXiv Detail & Related papers (2025-08-29T17:28:07Z) - INFNet: A Task-aware Information Flow Network for Large-Scale Recommendation Systems [8.283354901677692]
Information Flow Network (INFNet) is a task-aware architecture designed for large-scale recommendation scenarios.<n>INFNet distinguishes features into three token types, categorical tokens, sequence tokens, and task tokens, and introduces a novel dual-flow design.<n>INFNet has been successfully deployed in a commercial online advertising system, yielding significant gains of +1.587% in Revenue (REV) and +1.155% in Click-Through Rate (CTR)
arXiv Detail & Related papers (2025-08-15T16:18:32Z) - 1$^{st}$ Place Solution of WWW 2025 EReL@MIR Workshop Multimodal CTR Prediction Challenge [1.509961504986039]
This report presents our 1$st$ place winning solution for Task 2 of the Multimodal CTR Prediction Challenge.<n>For multimodal information integration, we simply append the frozen multimodal embeddings to each item embedding.<n> Experiments on the challenge dataset demonstrate the effectiveness of our method, achieving superior performance with a 0.9839 AUC on the leaderboard.
arXiv Detail & Related papers (2025-05-06T13:55:22Z) - On the Practice of Deep Hierarchical Ensemble Network for Ad Conversion Rate Prediction [14.649184507551436]
We propose a multitask learning framework with DHEN as the single backbone model architecture to predict all CVR tasks.<n>We build both on-site real-time user behavior sequences and off-site conversion event sequences for CVR prediction purposes.<n>Our method achieves state-of-the-art performance compared to previous single feature crossing modules with pre-trained user personalization features.
arXiv Detail & Related papers (2025-04-10T23:41:34Z) - M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving [48.17490295484055]
M3Net is a novel network that simultaneously tackles detection, segmentation, and 3D occupancy prediction for autonomous driving.<n>M3Net achieves state-of-the-art multi-task learning performance on the nuScenes benchmarks.
arXiv Detail & Related papers (2025-03-23T15:08:09Z) - One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning [16.96824902454355]
We propose a unified framework that concurrently handles multiple tasks and modalities.<n>In this framework, all modalities and tasks are represented as unified tokens and trained using a single, consistent approach.<n>We present a new benchmark, MMUD, which includes samples annotated with multiple task labels.<n>We demonstrate the ability to handle multiple tasks simultaneously in a streamlined and efficient manner.
arXiv Detail & Related papers (2024-08-06T07:19:51Z) - SEMINAR: Search Enhanced Multi-modal Interest Network and Approximate Retrieval for Lifelong Sequential Recommendation [16.370075234443245]
We propose a unified lifelong multi-modal sequence model called SEMINAR-Search Enhanced Multi-Modal Interest Network and Approximate Retrieval.
Specifically, a network called Pretraining Search Unit learns the lifelong sequences of multi-modal query-item pairs in a pretraining-finetuning manner.
To accelerate the online retrieval speed of multi-modal embedding, we propose a multi-modal codebook-based product quantization strategy.
arXiv Detail & Related papers (2024-07-15T13:33:30Z) - Generative Multimodal Models are In-Context Learners [60.50927925426832]
We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences.
Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning.
arXiv Detail & Related papers (2023-12-20T18:59:58Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Controllable Dynamic Multi-Task Architectures [92.74372912009127]
We propose a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints.
We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights.
arXiv Detail & Related papers (2022-03-28T17:56:40Z) - Routing with Self-Attention for Multimodal Capsule Networks [108.85007719132618]
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework.
To adapt the capsules to large-scale input data, we propose a novel routing by self-attention mechanism that selects relevant capsules.
This allows not only for robust training with noisy video data, but also to scale up the size of the capsule network compared to traditional routing methods.
arXiv Detail & Related papers (2021-12-01T19:01:26Z) - An Analysis Of Entire Space Multi-Task Models For Post-Click Conversion
Prediction [3.2979460528864926]
We consider approximating the probability of post-click conversion events (installs) for mobile app advertising on a large-scale advertising platform.
We show that several different approaches result in similar levels of positive transfer from the data-abundant CTR task to the CVR task.
Our findings add to the growing body of evidence suggesting that standard multi-task learning is a sensible approach to modelling related events in real-world large-scale applications.
arXiv Detail & Related papers (2021-08-18T13:39:50Z) - Joint predictions of multi-modal ride-hailing demands: a deep multi-task
multigraph learning-based approach [64.18639899347822]
We propose a deep multi-task multi-graph learning approach, which combines multiple multi-graph convolutional (MGC) networks for predicting demands for different service modes.
We show that our propose approach outperforms the benchmark algorithms in prediction accuracy for different ride-hailing modes.
arXiv Detail & Related papers (2020-11-11T07:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.