Related papers: A Directed-Evolution Method for Sparsification and Compression of Neural Networks with Application to Object Identification and Segmentation and considerations of optimal quantization using small number of bits

A Directed-Evolution Method for Sparsification and Compression of Neural Networks with Application to Object Identification and Segmentation and considerations of optimal quantization using small number of bits

URL: http://arxiv.org/abs/2206.05859v1
Date: Sun, 12 Jun 2022 23:49:08 GMT
Title: A Directed-Evolution Method for Sparsification and Compression of Neural Networks with Application to Object Identification and Segmentation and considerations of optimal quantization using small number of bits
Authors: Luiz M Franca-Neto
Abstract summary: This work introduces Directed-Evolution method for sparsification of neural networks. The relevance of parameters to the network accuracy is directly assessed. The parameters that produce the least effect on accuracy when tentatively zeroed are indeed zeroed.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work introduces Directed-Evolution (DE) method for sparsification of neural networks, where the relevance of parameters to the network accuracy is directly assessed and the parameters that produce the least effect on accuracy when tentatively zeroed are indeed zeroed. DE method avoids a potentially combinatorial explosion of all possible candidate sets of parameters to be zeroed in large networks by mimicking evolution in the natural world. DE uses a distillation context [5]. In this context, the original network is the teacher and DE evolves the student neural network to the sparsification goal while maintaining minimal divergence between teacher and student. After the desired sparsification level is reached in each layer of the network by DE, a variety of quantization alternatives are used on the surviving parameters to find the lowest number of bits for their representation with acceptable loss of accuracy. A procedure to find optimal distribution of quantization levels in each sparsified layer is presented. Suitable final lossless encoding of the surviving quantized parameters is used for the final parameter representation. DE was used in sample of representative neural networks using MNIST, FashionMNIST and COCO data sets with progressive larger networks. An 80 classes YOLOv3 with more than 60 million parameters network trained on COCO dataset reached 90% sparsification and correctly identifies and segments all objects identified by the original network with more than 80% confidence using 4bit parameter quantization. Compression between 40x and 80x. It has not escaped the authors that techniques from different methods can be nested. Once the best parameter set for sparsification is identified in a cycle of DE, a decision on zeroing only a sub-set of those parameters can be made using a combination of criteria like parameter magnitude and Hessian approximations.

Related papers

NIDS Neural Networks Using Sliding Time Window Data Processing with Trainable Activations and its Generalization Capability [0.0]
This paper presents neural networks for network intrusion detection systems (NIDS) that operate on flow data preprocessed with a time window. It requires only eleven features which do not rely on deep packet inspection and can be found in most NIDS datasets and easily obtained from conventional flow collectors. The reported training accuracy exceeds 99% for the proposed method with as little as twenty neural network input features.
arXiv Detail & Related papers (2024-10-24T11:36:19Z)
Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Towards Generalized Entropic Sparsification for Convolutional Neural Networks [0.0]
Convolutional neural networks (CNNs) are reported to be overparametrized. Here, we introduce a layer-by-layer data-driven pruning method based on the mathematical idea aiming at a computationally-scalable entropic relaxation of the pruning problem. The sparse subnetwork is found from the pre-trained (full) CNN using the network entropy minimization as a sparsity constraint.
arXiv Detail & Related papers (2024-04-06T21:33:39Z)
Adaptive Multilevel Neural Networks for Parametric PDEs with Error Estimation [0.0]
A neural network architecture is presented to solve high-dimensional parameter-dependent partial differential equations (pPDEs) It is constructed to map parameters of the model data to corresponding finite element solutions. It outputs a coarse grid solution and a series of corrections as produced in an adaptive finite element method (AFEM)
arXiv Detail & Related papers (2024-03-19T11:34:40Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
Efficient Sparse Artificial Neural Networks [11.945854832533232]
The brain, as the source of inspiration for Artificial Neural Networks (ANN), is based on a sparse structure. This sparse structure helps the brain to consume less energy, learn easier and generalize patterns better than any other ANN. In this paper, two evolutionary methods for adopting sparsity to ANNs are proposed.
arXiv Detail & Related papers (2021-03-13T10:03:41Z)
Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators. We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Ensembled sparse-input hierarchical networks for high-dimensional datasets [8.629912408966145]
We show that dense neural networks can be a practical data analysis tool in settings with small sample sizes. A proposed method appropriately prunes the network structure by tuning only two L1-penalty parameters. On a collection of real-world datasets with different sizes, EASIER-net selected network architectures in a data-adaptive manner and achieved higher prediction accuracy than off-the-shelf methods on average.
arXiv Detail & Related papers (2020-05-11T02:08:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.