On the Compression of Natural Language Models
- URL: http://arxiv.org/abs/2112.11480v1
- Date: Mon, 13 Dec 2021 08:14:21 GMT
- Title: On the Compression of Natural Language Models
- Authors: Saeed Damadi
- Abstract summary: We will review state-of-the-art compression techniques such as quantization, knowledge distillation, and pruning.
The goal of this work is to assess whether such a trainable subnetwork exists for natural language models (NLM)
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks are effective feature extractors but they are
prohibitively large for deployment scenarios. Due to the huge number of
parameters, interpretability of parameters in different layers is not
straight-forward. This is why neural networks are sometimes considered black
boxes. Although simpler models are easier to explain, finding them is not easy.
If found, a sparse network that can fit to a data from scratch would help to
interpret parameters of a neural network. To this end, lottery ticket
hypothesis states that typical dense neural networks contain a small sparse
sub-network that can be trained to a reach similar test accuracy in an equal
number of steps. The goal of this work is to assess whether such a trainable
subnetwork exists for natural language models (NLM)s. To achieve this goal we
will review state-of-the-art compression techniques such as quantization,
knowledge distillation, and pruning.
Related papers
- Residual Random Neural Networks [0.0]
Single-layer feedforward neural network with random weights is a recurring motif in the neural networks literature.
We show that one can obtain good classification results even if the number of hidden neurons has the same order of magnitude as the dimensionality of the data samples.
arXiv Detail & Related papers (2024-10-25T22:00:11Z) - No Free Prune: Information-Theoretic Barriers to Pruning at Initialization [8.125999058340998]
We show the Law of Robustness of arXiv:2105.12806 extends to sparse networks with the usual parameter count replaced by $p_texteff$.
Experiments on neural networks confirm that information gained during training may indeed affect model capacity.
arXiv Detail & Related papers (2024-02-02T01:13:16Z) - Sampling weights of deep neural networks [1.2370077627846041]
We introduce a probability distribution, combined with an efficient sampling algorithm, for weights and biases of fully-connected neural networks.
In a supervised learning context, no iterative optimization or gradient computations of internal network parameters are needed.
We prove that sampled networks are universal approximators.
arXiv Detail & Related papers (2023-06-29T10:13:36Z) - The smooth output assumption, and why deep networks are better than wide
ones [0.0]
We propose a new measure that predicts how well a model will generalize.
It is based on the fact that, in reality, boundaries between concepts are generally unsharp.
arXiv Detail & Related papers (2022-11-25T19:05:44Z) - Locally Sparse Networks for Interpretable Predictions [7.362415721170984]
We propose a framework for training locally sparse neural networks where the local sparsity is learned via a sample-specific gating mechanism.
The sample-specific sparsity is predicted via a textitgating network, which is trained in tandem with the textitprediction network.
We demonstrate that our method outperforms state-of-the-art models when predicting the target function with far fewer features per instance.
arXiv Detail & Related papers (2021-06-11T15:46:50Z) - Leveraging Sparse Linear Layers for Debuggable Deep Networks [86.94586860037049]
We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks.
The resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks.
arXiv Detail & Related papers (2021-05-11T08:15:25Z) - Artificial Neural Networks generated by Low Discrepancy Sequences [59.51653996175648]
We generate artificial neural networks as random walks on a dense network graph.
Such networks can be trained sparse from scratch, avoiding the expensive procedure of training a dense network and compressing it afterwards.
We demonstrate that the artificial neural networks generated by low discrepancy sequences can achieve an accuracy within reach of their dense counterparts at a much lower computational complexity.
arXiv Detail & Related papers (2021-03-05T08:45:43Z) - The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z) - On the distance between two neural networks and the stability of
learning [59.62047284234815]
This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions.
The analysis leads to a new distance function called deep relative trust and a descent lemma for neural networks.
arXiv Detail & Related papers (2020-02-09T19:18:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.