Distilled Neural Networks for Efficient Learning to Rank
- URL: http://arxiv.org/abs/2202.10728v1
- Date: Tue, 22 Feb 2022 08:40:18 GMT
- Title: Distilled Neural Networks for Efficient Learning to Rank
- Authors: F.M. Nardini, C. Rulli, S. Trani, R.Venturini
- Abstract summary: We propose an approach for speeding up neural scoring time by applying a combination of Distillation, Pruning and Fast Matrix multiplication.
Comprehensive experiments on two public learning-to-rank datasets show that neural networks produced with our novel approach are competitive at any point of the effectiveness-efficiency trade-off.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies in Learning to Rank have shown the possibility to effectively
distill a neural network from an ensemble of regression trees. This result
leads neural networks to become a natural competitor of tree-based ensembles on
the ranking task. Nevertheless, ensembles of regression trees outperform neural
models both in terms of efficiency and effectiveness, particularly when scoring
on CPU. In this paper, we propose an approach for speeding up neural scoring
time by applying a combination of Distillation, Pruning and Fast Matrix
multiplication. We employ knowledge distillation to learn shallow neural
networks from an ensemble of regression trees. Then, we exploit an
efficiency-oriented pruning technique that performs a sparsification of the
most computationally-intensive layers of the neural network that is then scored
with optimized sparse matrix multiplication. Moreover, by studying both dense
and sparse high performance matrix multiplication, we develop a scoring time
prediction model which helps in devising neural network architectures that
match the desired efficiency requirements. Comprehensive experiments on two
public learning-to-rank datasets show that neural networks produced with our
novel approach are competitive at any point of the effectiveness-efficiency
trade-off when compared with tree-based ensembles, providing up to 4x scoring
time speed-up without affecting the ranking quality.
Related papers
- Hybrid deep additive neural networks [0.0]
We introduce novel deep neural networks that incorporate the idea of additive regression.
Our neural networks share architectural similarities with Kolmogorov-Arnold networks but are based on simpler yet flexible activation and basis functions.
We derive their universal approximation properties and demonstrate their effectiveness through simulation studies and a real-data application.
arXiv Detail & Related papers (2024-11-14T04:26:47Z) - Transductive Spiking Graph Neural Networks for Loihi [0.8684584813982095]
We present a fully neuromorphic implementation of spiking graph neural networks designed for Loihi 2.
We showcase the performance benefits of combining neuromorphic Bayesian optimization with our approach for citation graph classification using fixed-precision spiking neurons.
arXiv Detail & Related papers (2024-04-25T21:15:15Z) - Activity Sparsity Complements Weight Sparsity for Efficient RNN
Inference [2.0822643340897273]
We show that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model.
We achieve up to $20times$ reduction of computation while maintaining perplexities below $60$ on the Penn Treebank language modeling task.
arXiv Detail & Related papers (2023-11-13T08:18:44Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Can pruning improve certified robustness of neural networks? [106.03070538582222]
We show that neural network pruning can improve empirical robustness of deep neural networks (NNs)
Our experiments show that by appropriately pruning an NN, its certified accuracy can be boosted up to 8.2% under standard training.
We additionally observe the existence of certified lottery tickets that can match both standard and certified robust accuracies of the original dense models.
arXiv Detail & Related papers (2022-06-15T05:48:51Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks.
We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z) - Truly Sparse Neural Networks at Scale [2.2860412844991655]
We train the largest neural network ever trained in terms of representational power -- reaching the bat brain size.
Our approach has state-of-the-art performance while opening the path for an environmentally friendly artificial intelligence era.
arXiv Detail & Related papers (2021-02-02T20:06:47Z) - Generalized Leverage Score Sampling for Neural Networks [82.95180314408205]
Leverage score sampling is a powerful technique that originates from theoretical computer science.
In this work, we generalize the results in [Avron, Kapralov, Musco, Musco, Velingker and Zandieh 17] to a broader class of kernels.
arXiv Detail & Related papers (2020-09-21T14:46:01Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.