Related papers: Hierarchically Modeling Micro and Macro Behaviors via Multi-Task Learning for Conversion Rate Prediction

Hierarchically Modeling Micro and Macro Behaviors via Multi-Task Learning for Conversion Rate Prediction

URL: http://arxiv.org/abs/2104.09713v1
Date: Tue, 20 Apr 2021 01:45:06 GMT
Title: Hierarchically Modeling Micro and Macro Behaviors via Multi-Task Learning for Conversion Rate Prediction
Authors: Hong Wen and Jing Zhang and Fuyu Lv and Wentian Bao and Tianyi Wang and Zulong Chen
Abstract summary: Conversion Rate (emphCVR) prediction in modern industrial e-commerce platforms is becoming increasingly important. We propose a novel emphCVR prediction method by Hierarchically Modeling both Micro and Macro behaviors. $HM3$ can be trained end-to-end and address the emph SSB and emphDS issues.
Score: 14.494225676311448
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conversion Rate (\emph{CVR}) prediction in modern industrial e-commerce platforms is becoming increasingly important, which directly contributes to the final revenue. In order to address the well-known sample selection bias (\emph{SSB}) and data sparsity (\emph{DS}) issues encountered during CVR modeling, the abundant labeled macro behaviors ($i.e.$, user's interactions with items) are used. Nonetheless, we observe that several purchase-related micro behaviors ($i.e.$, user's interactions with specific components on the item detail page) can supplement fine-grained cues for \emph{CVR} prediction. Motivated by this observation, we propose a novel \emph{CVR} prediction method by Hierarchically Modeling both Micro and Macro behaviors ($HM^3$). Specifically, we first construct a complete user sequential behavior graph to hierarchically represent micro behaviors and macro behaviors as one-hop and two-hop post-click nodes. Then, we embody $HM^3$ as a multi-head deep neural network, which predicts six probability variables corresponding to explicit sub-paths in the graph. They are further combined into the prediction targets of four auxiliary tasks as well as the final $CVR$ according to the conditional probability rule defined on the graph. By employing multi-task learning and leveraging the abundant supervisory labels from micro and macro behaviors, $HM^3$ can be trained end-to-end and address the \emph{SSB} and \emph{DS} issues. Extensive experiments on both offline and online settings demonstrate the superiority of the proposed $HM^3$ over representative state-of-the-art methods.

Related papers

Process-Tensor Tomography of SGD: Measuring Non-Markovian Memory via Back-Flow of Distinguishability [1.078600700827543]
We build a simple model-agnostic witness of training memory based on emphback-flow of distinguishability.<n>We observe consistent positive back-flow with tight bootstrap confidence intervals, amplification under higher momentum, and more micro-steps.<n>We position this as a principled diagnostic and empirical evidence that practical SGD deviates from the Markov idealization.
arXiv Detail & Related papers (2026-01-23T09:03:25Z)
From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models [81.43473418572567]
Click-Through Rate (CTR) prediction is a core task in recommendation systems.<n>We propose a novel generative framework to address embedding dimensional collapse and information redundancy.<n>We show that SFG consistently mitigates embedding collapse and reduces information redundancy, while yielding substantial performance gains.
arXiv Detail & Related papers (2025-12-16T03:17:18Z)
Re$^{\ ext{2}}$MaP: Macro Placement by Recursively Prototyping and Packing Tree-based Relocating [67.49674976434322]
This work introduces the Re$text2$MaP method, which generates expert-quality macro placements.<n>We use DREAMPlace to build a mixed-size placement prototype and obtain reference positions for each macro and cluster.<n>A packing tree-based relocating procedure is then designed to jointly adjust the locations of macro groups and the macros within each group.
arXiv Detail & Related papers (2025-11-11T09:56:10Z)
FLARE: Robot Learning with Implicit World Modeling [87.81846091038676]
$textbfFLARE$ integrates predictive latent world modeling into robot policy learning.<n>$textbfFLARE$ achieves state-of-the-art performance, outperforming prior policy learning baselines by up to 26%.<n>Our results establish $textbfFLARE$ as a general and scalable approach for combining implicit world modeling with high-frequency robotic control.
arXiv Detail & Related papers (2025-05-21T15:33:27Z)
H$^3$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning [25.65324419553667]
We introduce $textbfTriply-Hierarchical Diffusion Policy(textbfH$mathbf3$DP)$, a novel visuomotor learning framework that explicitly incorporates hierarchical structures to strengthen the integration between visual features and action generation.<n> Extensive experiments demonstrate that H$3$DP yields a $mathbf+27.5%$ average relative improvement over baselines across $mathbf44$ simulation tasks and achieves superior performance in $mathbf4$ challenging bimanual real-world manipulation tasks.
arXiv Detail & Related papers (2025-05-12T17:59:43Z)
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\ exttt{D}}$ual-$\mathbf{\ exttt{H}}$ead $\mathbf{\ exttt{O}}$ptimization [49.2338910653152]
Vision-constrained models (VLMs) have achieved remarkable success across diverse tasks by leveraging rich textual information with minimal labeled data.<n> Knowledge distillation (KD) offers a well-established solution to this problem; however, recent KD approaches from VLMs often involve multi-stage training or additional tuning.<n>We propose $mathbftextttDHO$ -- a simple yet effective KD framework that transfers knowledge from VLMs to compact, task-specific models in semi-language settings.
arXiv Detail & Related papers (2025-05-12T15:39:51Z)
On the Practice of Deep Hierarchical Ensemble Network for Ad Conversion Rate Prediction [14.649184507551436]
We propose a multitask learning framework with DHEN as the single backbone model architecture to predict all CVR tasks. We build both on-site real-time user behavior sequences and off-site conversion event sequences for CVR prediction purposes. Our method achieves state-of-the-art performance compared to previous single feature crossing modules with pre-trained user personalization features.
arXiv Detail & Related papers (2025-04-10T23:41:34Z)
Multi-granularity Interest Retrieval and Refinement Network for Long-Term User Behavior Modeling in CTR Prediction [68.90783662117936]
Click-through Rate (CTR) prediction is crucial for online personalization platforms. Recent advancements have shown that modeling rich user behaviors can significantly improve the performance of CTR prediction. We propose Multi-granularity Interest Retrieval and Refinement Network (MIRRN)
arXiv Detail & Related papers (2024-11-22T15:29:05Z)
Towards the Causal Complete Cause of Multi-Modal Representation Learning [39.171796664005434]
Multi-Modal Learning aims to learn effective representations across modalities for accurate predictions.<n>From a causal perspective, effective MML representations should be causally sufficient and necessary.<n>We propose $C3$ Regularization, a plug-and-play method that enforces the causal completeness of the learned representations.
arXiv Detail & Related papers (2024-07-19T06:35:49Z)
TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction [61.295716741720284]
TokenUnify is a novel pretraining method that integrates random token prediction, next-token prediction, and next-all token prediction. Cooperated with TokenUnify, we have assembled a large-scale electron microscopy (EM) image dataset with ultra-high resolution. This dataset includes over 120 million annotated voxels, making it the largest neuron segmentation dataset to date.
arXiv Detail & Related papers (2024-05-27T05:45:51Z)
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition [71.33787410075577]
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses. We propose a new algorithm that attains an $widetildeO(dsqrtHS3K + sqrtHSAK)$ regret with high probability.
arXiv Detail & Related papers (2024-03-07T15:03:50Z)
MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction [39.48740397029264]
We propose a Model-agnostic pretraining (MAP) framework that applies feature corruption and recovery on multi-field categorical data. We derive two practical algorithms: masked feature prediction (RFD) and replaced feature detection (RFD)
arXiv Detail & Related papers (2023-08-03T12:55:55Z)
Micron-BERT: BERT-based Facial Micro-Expression Recognition [15.367299107839418]
Micron-BERT ($mu$-BERT) is a novel approach to facial micro-expression recognition. The proposed method can automatically capture these movements in an unsupervised manner. $mu$-BERT consistently outperforms state-of-the-art performance on four micro-expression benchmarks.
arXiv Detail & Related papers (2023-04-06T16:19:09Z)
Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
We study representation learning for efficient imitation learning over linear systems. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $tildeOleft( frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$.
arXiv Detail & Related papers (2022-12-01T00:14:35Z)
Graph Neural Networks for Multimodal Single-Cell Data Integration [32.8390339109358]
We present a general Graph Neural Network framework $textitscMoGNN$ to tackle three tasks. textitscMoGNN$ demonstrates superior results in all three tasks compared with the state-of-the-art and conventional approaches.
arXiv Detail & Related papers (2022-03-03T17:59:02Z)
Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation [92.99933928528797]
We study the model-based reward-free reinforcement learning with linear function approximation for episodic Markov decision processes (MDPs) In the planning phase, the agent is given a specific reward function and uses samples collected from the exploration phase to learn a good policy. We show that to obtain an $epsilon$-optimal policy for arbitrary reward function, UCRL-RFE needs to sample at most $tilde O(H4d(H + d)epsilon-2)$ episodes.
arXiv Detail & Related papers (2021-10-12T23:03:58Z)
Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation [81.0955457177017]
In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain reward set. We study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps.
arXiv Detail & Related papers (2021-08-19T16:16:00Z)
End-to-End User Behavior Retrieval in Click-Through RatePrediction Model [15.52581453176164]
We propose a locality-sensitive hashing (LSH) method called ETA which can greatly reduce the training and inference cost. We deploy ETA into a large-scale real world E-commerce system and achieve extra 3.1% improvements on GMV (Gross Merchandise Value) compared to a two-stage long user sequence CTR model.
arXiv Detail & Related papers (2021-08-10T06:28:29Z)
Robust Meta-learning for Mixed Linear Regression with Small Batches [34.94138630547603]
We study a fundamental question: can abundant small-data tasks compensate for the lack of big-data tasks? Existing approaches show that such a trade-off is efficiently achievable, with the help of medium-sized tasks with $Omega(k1/2)$ examples each. We introduce a spectral approach that is simultaneously robust under both scenarios.
arXiv Detail & Related papers (2020-06-17T07:59:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.