DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale
Click-Through Rate Prediction
- URL: http://arxiv.org/abs/2203.11014v1
- Date: Fri, 11 Mar 2022 21:19:31 GMT
- Title: DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale
Click-Through Rate Prediction
- Authors: Buyun Zhang, Liang Luo, Xi Liu, Jay Li, Zeliang Chen, Weilin Zhang,
Xiaohan Wei, Yuchen Hao, Michael Tsang, Wenjun Wang, Yang Liu, Huayu Li,
Yasmine Badr, Jongsoo Park, Jiyan Yang, Dheevatsa Mudigere, Ellie Wen
- Abstract summary: We propose DHEN - a deep and hierarchical ensemble architecture that can leverage strengths of heterogeneous interaction modules and learn a hierarchy of the interactions under different orders.
Experiments on large-scale dataset from CTR prediction tasks attained 0.27% improvement on the Normalized Entropy of prediction and 1.2x better training throughput than state-of-the-art baseline.
- Score: 20.51885543358098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning feature interactions is important to the model performance of online
advertising services. As a result, extensive efforts have been devoted to
designing effective architectures to learn feature interactions. However, we
observe that the practical performance of those designs can vary from dataset
to dataset, even when the order of interactions claimed to be captured is the
same. That indicates different designs may have different advantages and the
interactions captured by them have non-overlapping information. Motivated by
this observation, we propose DHEN - a deep and hierarchical ensemble
architecture that can leverage strengths of heterogeneous interaction modules
and learn a hierarchy of the interactions under different orders. To overcome
the challenge brought by DHEN's deeper and multi-layer structure in training,
we propose a novel co-designed training system that can further improve the
training efficiency of DHEN. Experiments of DHEN on large-scale dataset from
CTR prediction tasks attained 0.27\% improvement on the Normalized Entropy (NE)
of prediction and 1.2x better training throughput than state-of-the-art
baseline, demonstrating their effectiveness in practice.
Related papers
- A framework for measuring the training efficiency of a neural architecture [1.5373453926913085]
This paper presents an experimental framework to measure the training efficiency of a neural architecture.
We analyze the training efficiency of Convolutional Neural Networks and Bayesian equivalents on the MNIST and CIFAR-10 tasks.
arXiv Detail & Related papers (2024-09-12T10:45:38Z) - Unveiling Backbone Effects in CLIP: Exploring Representational Synergies
and Variances [49.631908848868505]
Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning.
We investigate the differences in CLIP performance among various neural architectures.
We propose a simple, yet effective approach to combine predictions from multiple backbones, leading to a notable performance boost of up to 6.34%.
arXiv Detail & Related papers (2023-12-22T03:01:41Z) - DEPHN: Different Expression Parallel Heterogeneous Network using virtual
gradient optimization for Multi-task Learning [1.0705399532413615]
Recommendation system algorithm based on multi-task learning (MTL) is the major method for Internet operators to understand users and predict their behaviors.
Traditional models use shared-bottom models and gating experts to realize shared representation learning and information differentiation.
We propose an Different Expression Parallel Heterogeneous Network (DEPHN) to model multiple tasks simultaneously.
arXiv Detail & Related papers (2023-07-24T04:29:00Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - HCE: Improving Performance and Efficiency with Heterogeneously
Compressed Neural Network Ensemble [22.065904428696353]
Recent ensemble training method explores different training algorithms or settings on multiple sub-models with the same model architecture.
We propose Heterogeneously Compressed Ensemble (HCE), where we build an efficient ensemble with the pruned and quantized variants from a pretrained DNN model.
arXiv Detail & Related papers (2023-01-18T21:47:05Z) - AdaEnsemble: Learning Adaptively Sparse Structured Ensemble Network for
Click-Through Rate Prediction [0.0]
We propose AdaEnsemble: a Sparsely-Gated Mixture-of-Experts architecture that can leverage the strengths of heterogeneous feature interaction experts.
AdaEnsemble can adaptively choose the feature interaction depth and find the corresponding SparseMoE stacking layer to exit and compute prediction from.
We implement the proposed AdaEnsemble and evaluate its performance on real-world datasets.
arXiv Detail & Related papers (2023-01-06T12:08:15Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - Weak Augmentation Guided Relational Self-Supervised Learning [80.0680103295137]
We introduce a novel relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances.
Our proposed method employs sharpened distribution of pairwise similarities among different instances as textitrelation metric.
Experimental results show that our proposed ReSSL substantially outperforms the state-of-the-art methods across different network architectures.
arXiv Detail & Related papers (2022-03-16T16:14:19Z) - LENAS: Learning-based Neural Architecture Search and Ensemble for 3D Radiotherapy Dose Prediction [42.38793195337463]
We propose a novel learning-based ensemble approach named LENAS, which integrates neural architecture search with knowledge distillation for 3D radiotherapy dose prediction.
Our approach starts by exhaustively searching each block from an enormous architecture space to identify multiple architectures that exhibit promising performance.
To mitigate the complexity introduced by the model ensemble, we adopt the teacher-student paradigm, leveraging the diverse outputs from multiple learned networks as supervisory signals.
arXiv Detail & Related papers (2021-06-12T10:08:52Z) - Learning to Relate Depth and Semantics for Unsupervised Domain
Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting.
We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions.
Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.