Related papers: Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction

Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction

URL: http://arxiv.org/abs/2209.12212v1
Date: Sun, 25 Sep 2022 12:59:57 GMT
Title: Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction
Authors: Qiwei Chen, Yue Xu, Changhua Pei, Shanshan Lv, Tao Zhuang, Junfeng Ge
Abstract summary: We propose an end-to-end paradigm to model long behavior sequences, which is able to achieve superior performance and remarkable cost-efficiency. Our contribution is three-fold: First, we propose a hashing-based efficient target attention (TA) network named ETA-Net to enable end-to-end user behavior retrieval. ETA-Net has been deployed on the recommender system of Taobao, and brought 1.8% lift on CTR and 3.1% lift on Gross Merchandise Value (GMV) compared to the SOTA two-stage methods.
Score: 9.455385139394494
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies on Click-Through Rate (CTR) prediction has reached new levels by modeling longer user behavior sequences. Among others, the two-stage methods stand out as the state-of-the-art (SOTA) solution for industrial applications. The two-stage methods first train a retrieval model to truncate the long behavior sequence beforehand and then use the truncated sequences to train a CTR model. However, the retrieval model and the CTR model are trained separately. So the retrieved subsequences in the CTR model is inaccurate, which degrades the final performance. In this paper, we propose an end-to-end paradigm to model long behavior sequences, which is able to achieve superior performance along with remarkable cost-efficiency compared to existing models. Our contribution is three-fold: First, we propose a hashing-based efficient target attention (TA) network named ETA-Net to enable end-to-end user behavior retrieval based on low-cost bit-wise operations. The proposed ETA-Net can reduce the complexity of standard TA by orders of magnitude for sequential data modeling. Second, we propose a general system architecture as one viable solution to deploy ETA-Net on industrial systems. Particularly, ETA-Net has been deployed on the recommender system of Taobao, and brought 1.8% lift on CTR and 3.1% lift on Gross Merchandise Value (GMV) compared to the SOTA two-stage methods. Third, we conduct extensive experiments on both offline datasets and online A/B test. The results verify that the proposed model outperforms existing CTR models considerably, in terms of both CTR prediction performance and online cost-efficiency. ETA-Net now serves the main traffic of Taobao, delivering services to hundreds of millions of users towards billions of items every day.

Related papers

A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning. We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z)
Deep Learning-Based Cyber-Attack Detection Model for Smart Grids [6.642400003243118]
A novel artificial intelligence-based cyber-attack detection model is developed to stop data integrity cyber-attacks (DIAs) on the received load data by supervisory control and data acquisition (SCADA) In the proposed model, first the load data is forecasted using a regression model and after processing stage, the processed data is clustered using the unsupervised learning method. The proposed EE-BiLSTM method can perform more robust and accurate compared to the other two methods.
arXiv Detail & Related papers (2023-12-14T10:54:04Z)
MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction [39.48740397029264]
We propose a Model-agnostic pretraining (MAP) framework that applies feature corruption and recovery on multi-field categorical data. We derive two practical algorithms: masked feature prediction (RFD) and replaced feature detection (RFD)
arXiv Detail & Related papers (2023-08-03T12:55:55Z)
Directed Acyclic Graph Factorization Machines for CTR Prediction via Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation. KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z)
Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets. We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes. We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z)
Dynamic Parameterized Network for CTR Prediction [6.749659219776502]
We proposed a novel plug-in operation, Dynamic ized Operation (DPO), to learn both explicit and implicit interaction instance-wisely. We showed that the introduction of DPO into DNN modules and Attention modules can respectively benefit two main tasks in click-through rate (CTR) prediction. Our Dynamic ized Networks significantly outperforms state-of-the-art methods in the offline experiments on the public dataset and real-world production dataset.
arXiv Detail & Related papers (2021-11-09T08:15:03Z)
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive. We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)
End-to-End User Behavior Retrieval in Click-Through RatePrediction Model [15.52581453176164]
We propose a locality-sensitive hashing (LSH) method called ETA which can greatly reduce the training and inference cost. We deploy ETA into a large-scale real world E-commerce system and achieve extra 3.1% improvements on GMV (Gross Merchandise Value) compared to a two-stage long user sequence CTR model.
arXiv Detail & Related papers (2021-08-10T06:28:29Z)
When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable. In order to achieve a better accuracy, we propose two lightweight modules. DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers. QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)
DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search [55.164053971213576]
convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead. Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure. Existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space.
arXiv Detail & Related papers (2020-11-04T07:43:01Z)
Iterative Boosting Deep Neural Networks for Predicting Click-Through Rate [15.90144113403866]
The click-through rate (CTR) reflects the ratio of clicks on a specific item to its total number of views. XdBoost is an iterative three-stage neural network model influenced by the traditional machine learning boosting mechanism.
arXiv Detail & Related papers (2020-07-26T09:41:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.