Learning Discrete Weights and Activations Using the Local
Reparameterization Trick
- URL: http://arxiv.org/abs/2307.01683v1
- Date: Tue, 4 Jul 2023 12:27:10 GMT
- Title: Learning Discrete Weights and Activations Using the Local
Reparameterization Trick
- Authors: Guy Berger, Aviv Navon, Ethan Fetaya
- Abstract summary: In computer vision and machine learning, a crucial challenge is to lower the computation and memory demands for neural network inference.
By binarizing the network weights and activations, one can significantly reduce computational complexity.
This leads to a more efficient neural network inference that can be deployed on low-resource devices.
- Score: 21.563618480463067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In computer vision and machine learning, a crucial challenge is to lower the
computation and memory demands for neural network inference. A commonplace
solution to address this challenge is through the use of binarization. By
binarizing the network weights and activations, one can significantly reduce
computational complexity by substituting the computationally expensive floating
operations with faster bitwise operations. This leads to a more efficient
neural network inference that can be deployed on low-resource devices. In this
work, we extend previous approaches that trained networks with discrete weights
using the local reparameterization trick to also allow for discrete
activations. The original approach optimized a distribution over the discrete
weights and uses the central limit theorem to approximate the pre-activation
with a continuous Gaussian distribution. Here we show that the probabilistic
modeling can also allow effective training of networks with discrete activation
as well. This further reduces runtime and memory footprint at inference time
with state-of-the-art results for networks with binary activations.
Related papers
- Beyond IID weights: sparse and low-rank deep Neural Networks are also Gaussian Processes [3.686808512438363]
We extend the proof of Matthews et al. to a larger class of initial weight distributions.
We show that fully-connected and convolutional networks with PSEUDO-IID distributions are all effectively equivalent up to their variance.
Using our results, one can identify the Edge-of-Chaos for a broader class of neural networks and tune them at criticality in order to enhance their training.
arXiv Detail & Related papers (2023-10-25T12:38:36Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two
Quantization [27.231327287238102]
We propose the DenseShift network, which significantly improves the accuracy of Shift networks.
Our experiments on various computer vision and speech tasks demonstrate that DenseShift outperforms existing low-bit multiplication-free networks.
arXiv Detail & Related papers (2022-08-20T15:17:40Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Training Certifiably Robust Neural Networks with Efficient Local
Lipschitz Bounds [99.23098204458336]
Certified robustness is a desirable property for deep neural networks in safety-critical applications.
We show that our method consistently outperforms state-of-the-art methods on MNIST and TinyNet datasets.
arXiv Detail & Related papers (2021-11-02T06:44:10Z) - PAC-Bayesian Learning of Aggregated Binary Activated Neural Networks
with Probabilities over Representations [2.047424180164312]
We study the expectation of a probabilistic neural network as a predictor by itself, focusing on the aggregation of binary activated neural networks with normal distributions over real-valued weights.
We show that the exact computation remains tractable for deep but narrow neural networks, thanks to a dynamic programming approach.
arXiv Detail & Related papers (2021-10-28T14:11:07Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - Artificial Neural Networks generated by Low Discrepancy Sequences [59.51653996175648]
We generate artificial neural networks as random walks on a dense network graph.
Such networks can be trained sparse from scratch, avoiding the expensive procedure of training a dense network and compressing it afterwards.
We demonstrate that the artificial neural networks generated by low discrepancy sequences can achieve an accuracy within reach of their dense counterparts at a much lower computational complexity.
arXiv Detail & Related papers (2021-03-05T08:45:43Z) - Attentive Gaussian processes for probabilistic time-series generation [4.94950858749529]
We propose a computationally efficient attention-based network combined with the Gaussian process regression to generate real-valued sequence.
We develop a block-wise training algorithm to allow mini-batch training of the network while the GP is trained using full-batch.
The algorithm has been proved to converge and shows comparable, if not better, quality of the found solution.
arXiv Detail & Related papers (2021-02-10T01:19:15Z) - ItNet: iterative neural networks with small graphs for accurate and
efficient anytime prediction [1.52292571922932]
In this study, we introduce a class of network models that have a small memory footprint in terms of their computational graphs.
We show state-of-the-art results for semantic segmentation on the CamVid and Cityscapes datasets.
arXiv Detail & Related papers (2021-01-21T15:56:29Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.