Enhanced Exploration in Neural Feature Selection for Deep Click-Through
Rate Prediction Models via Ensemble of Gating Layers
- URL: http://arxiv.org/abs/2112.03487v1
- Date: Tue, 7 Dec 2021 04:37:05 GMT
- Title: Enhanced Exploration in Neural Feature Selection for Deep Click-Through
Rate Prediction Models via Ensemble of Gating Layers
- Authors: Lin Guan, Xia Xiao, Ming Chen, Youlong Cheng
- Abstract summary: The goal of neural feature selection (NFS) is to choose a relatively small subset of features with the best explanatory power.
Gating approach inserts a set of differentiable binary gates to drop less informative features.
To improve the exploration capacity of gradient-based solutions, we propose a simple but effective ensemble learning approach.
- Score: 7.381829794276824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature selection has been an essential step in developing industry-scale
deep Click-Through Rate (CTR) prediction systems. The goal of neural feature
selection (NFS) is to choose a relatively small subset of features with the
best explanatory power as a means to remove redundant features and reduce
computational cost. Inspired by gradient-based neural architecture search (NAS)
and network pruning methods, people have tackled the NFS problem with Gating
approach that inserts a set of differentiable binary gates to drop less
informative features. The binary gates are optimized along with the network
parameters in an efficient end-to-end manner. In this paper, we analyze the
gradient-based solution from an exploration-exploitation perspective and use
empirical results to show that Gating approach might suffer from insufficient
exploration. To improve the exploration capacity of gradient-based solutions,
we propose a simple but effective ensemble learning approach, named Ensemble
Gating. We choose two public datasets, namely Avazu and Criteo, to evaluate
this approach. Our experiments show that, without adding any computational
overhead or introducing any hyper-parameter (except the size of the ensemble),
our method is able to consistently improve Gating approach and find a better
subset of features on the two datasets with three different underlying deep CTR
prediction models.
Related papers
- GASE: Graph Attention Sampling with Edges Fusion for Solving Vehicle Routing Problems [6.084414764415137]
We propose an adaptive Graph Attention Sampling with the Edges Fusion framework to solve vehicle routing problems.
Our proposed model outperforms the existing methods by 2.08%-6.23% and shows stronger generalization ability.
arXiv Detail & Related papers (2024-05-21T03:33:07Z) - Explicit Foundation Model Optimization with Self-Attentive Feed-Forward
Neural Units [4.807347156077897]
Iterative approximation methods using backpropagation enable the optimization of neural networks, but they remain computationally expensive when used at scale.
This paper presents an efficient alternative for optimizing neural networks that reduces the costs of scaling neural networks and provides high-efficiency optimizations for low-resource applications.
arXiv Detail & Related papers (2023-11-13T17:55:07Z) - Accelerating Inverse Learning via Intelligent Localization with
Exploratory Sampling [1.5976506570992293]
solving inverse problems is a longstanding challenge in materials and drug discovery.
Deep generative models are recently proposed to solve inverse problems.
We propose a novel approach (called iPage) to accelerate the inverse learning process.
arXiv Detail & Related papers (2022-12-02T08:00:04Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Designing the Topology of Graph Neural Networks: A Novel Feature Fusion
Perspective [12.363386808994079]
We learn to design the topology of GNNs in a novel feature fusion perspective which is dubbed F$2$GNN.
We develop a neural architecture search method on top of the unified framework which contains a set of selection and fusion operations.
The performance gains on eight real-world datasets demonstrate the effectiveness of F$2$GNN.
arXiv Detail & Related papers (2021-12-29T13:06:12Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - VINNAS: Variational Inference-based Neural Network Architecture Search [2.685668802278155]
We present a differentiable variational inference-based NAS method for searching sparse convolutional neural networks.
Our method finds diverse network cells, while showing state-of-the-art accuracy with up to almost 2 times fewer non-zero parameters.
arXiv Detail & Related papers (2020-07-12T21:47:35Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z) - Learning to Optimize Non-Rigid Tracking [54.94145312763044]
We employ learnable optimizations to improve robustness and speed up solver convergence.
First, we upgrade the tracking objective by integrating an alignment data term on deep features which are learned end-to-end through CNN.
Second, we bridge the gap between the preconditioning technique and learning method by introducing a ConditionNet which is trained to generate a preconditioner.
arXiv Detail & Related papers (2020-03-27T04:40:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.