RPR: Random Partition Relaxation for Training; Binary and Ternary Weight
Neural Networks
- URL: http://arxiv.org/abs/2001.01091v1
- Date: Sat, 4 Jan 2020 15:56:10 GMT
- Title: RPR: Random Partition Relaxation for Training; Binary and Ternary Weight
Neural Networks
- Authors: Lukas Cavigelli, Luca Benini
- Abstract summary: We present Random Partition Relaxation (RPR), a method for strong quantization of neural networks weight to binary (+1/-1) and ternary (+1/0/-1) values.
We demonstrate binary and ternary-weight networks with accuracies beyond the state-of-the-art for GoogLeNet and competitive performance for ResNet-18 and ResNet-50.
- Score: 23.45606380793965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Random Partition Relaxation (RPR), a method for strong
quantization of neural networks weight to binary (+1/-1) and ternary (+1/0/-1)
values. Starting from a pre-trained model, we quantize the weights and then
relax random partitions of them to their continuous values for retraining
before re-quantizing them and switching to another weight partition for further
adaptation. We demonstrate binary and ternary-weight networks with accuracies
beyond the state-of-the-art for GoogLeNet and competitive performance for
ResNet-18 and ResNet-50 using an SGD-based training method that can easily be
integrated into existing frameworks.
Related papers
- Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - Random Weights Networks Work as Loss Prior Constraint for Image
Restoration [50.80507007507757]
We present our belief Random Weights Networks can be Acted as Loss Prior Constraint for Image Restoration''
Our belief can be directly inserted into existing networks without any training and testing computational cost.
To emphasize, our main focus is to spark the realms of loss function and save their current neglected status.
arXiv Detail & Related papers (2023-03-29T03:43:51Z) - Resilient Binary Neural Network [26.63280603795981]
We introduce a Resilient Binary Neural Network (ReBNN) to mitigate the frequent oscillation for better BNNs' training.
Our ReBNN achieves 66.9% Top-1 accuracy with ResNet-18 backbone on the ImageNet dataset.
arXiv Detail & Related papers (2023-02-02T08:51:07Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural
Networks by Pruning A Randomly Weighted Network [13.193734014710582]
We propose an algorithm for finding multi-prize tickets (MPTs) and test it by performing a series of experiments on CIFAR-10 and ImageNet datasets.
Our MPTs-1/32 not only set new binary weight network state-of-the-art (SOTA) Top-1 accuracy -- 94.8% on CIFAR-10 and 74.03% on ImageNet -- but also outperform their full-precision counterparts by 1.78% and 0.76%, respectively.
arXiv Detail & Related papers (2021-03-17T00:31:24Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Training Sparse Neural Networks using Compressed Sensing [13.84396596420605]
We develop and test a novel method based on compressed sensing which combines the pruning and training into a single step.
Specifically, we utilize an adaptively weighted $ell1$ penalty on the weights during training, which we combine with a generalization of the regularized dual averaging (RDA) algorithm in order to train sparse neural networks.
arXiv Detail & Related papers (2020-08-21T19:35:54Z) - Comparing Rewinding and Fine-tuning in Neural Network Pruning [28.663299059376897]
We compare fine-tuning and learning rate rewinding to train neural network pruning algorithms.
Both rewinding techniques form the basis of a network-agnostic algorithm that matches the accuracy and compression ratios of several more network-specific state-of-the-art techniques.
arXiv Detail & Related papers (2020-03-05T00:53:18Z) - Energy-efficient and Robust Cumulative Training with Net2Net
Transformation [2.4283778735260686]
We propose a cumulative training strategy that achieves training computational efficiency without incurring large accuracy loss.
We achieve this by first training a small network on a small subset of the original dataset, and then gradually expanding the network.
Experiments demonstrate that compared with training from scratch, cumulative training yields 2x reduction in computational complexity.
arXiv Detail & Related papers (2020-03-02T21:44:47Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.