Related papers: A2P-MANN: Adaptive Attention Inference Hops Pruned Memory-Augmented Neural Networks

A2P-MANN: Adaptive Attention Inference Hops Pruned Memory-Augmented Neural Networks

URL: http://arxiv.org/abs/2101.09693v1
Date: Sun, 24 Jan 2021 12:02:12 GMT
Title: A2P-MANN: Adaptive Attention Inference Hops Pruned Memory-Augmented Neural Networks
Authors: Mohsen Ahmadzadeh, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram
Abstract summary: We propose an online adaptive approach called A2P-MANN to limit the number of required attention inference hops in memory-augmented neural networks. The technique results in elimination of a large number of unnecessary computations in extracting the correct answer. The efficacy of the technique is assessed by using the twenty question-answering (QA) tasks of bAbI dataset.
Score: 3.682712058535653
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, to limit the number of required attention inference hops in memory-augmented neural networks, we propose an online adaptive approach called A2P-MANN. By exploiting a small neural network classifier, an adequate number of attention inference hops for the input query is determined. The technique results in elimination of a large number of unnecessary computations in extracting the correct answer. In addition, to further lower computations in A2P-MANN, we suggest pruning weights of the final FC (fully-connected) layers. To this end, two pruning approaches, one with negligible accuracy loss and the other with controllable loss on the final accuracy, are developed. The efficacy of the technique is assessed by using the twenty question-answering (QA) tasks of bAbI dataset. The analytical assessment reveals, on average, more than 42% fewer computations compared to the baseline MANN at the cost of less than 1% accuracy loss. In addition, when used along with the previously published zero-skipping technique, a computation count reduction of up to 68% is achieved. Finally, when the proposed approach (without zero-skipping) is implemented on the CPU and GPU platforms, up to 43% runtime reduction is achieved.

Related papers

Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective. We show how to compute this efficiently for tractable circuits. We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z)
Dual Accuracy-Quality-Driven Neural Network for Prediction Interval Generation [0.0]
We present a method to learn prediction intervals for regression-based neural networks automatically. Our main contribution is the design of a novel loss function for the PI-generation network. Experiments using a synthetic dataset, eight benchmark datasets, and a real-world crop yield prediction dataset showed that our method was able to maintain a nominal probability coverage.
arXiv Detail & Related papers (2022-12-13T05:03:16Z)
Fast Exploration of the Impact of Precision Reduction on Spiking Neural Networks [63.614519238823206]
Spiking Neural Networks (SNNs) are a practical choice when the target hardware reaches the edge of computing. We employ an Interval Arithmetic (IA) model to develop an exploration methodology that takes advantage of the capability of such a model to propagate the approximation error.
arXiv Detail & Related papers (2022-11-22T15:08:05Z)
Training Neural Networks in Single vs Double Precision [8.036150169408241]
Conjugate Gradient and RMSprop algorithms are optimized for mean square error. Experiments show that single-precision can keep up with double-precision if line search finds an improvement. For strongly nonlinear tasks, both algorithm classes find only solutions fairly poor in terms of mean square error.
arXiv Detail & Related papers (2022-09-15T11:20:53Z)
Spike time displacement based error backpropagation in convolutional spiking neural networks [0.6193838300896449]
In this paper, we extend the STiDi-BP algorithm to employ it in deeper and convolutional architectures. The evaluation results on the image classification task based on two popular benchmarks, MNIST and Fashion-MNIST, confirm that this algorithm has been applicable in deep SNNs. We consider a convolutional SNN with two sets of weights: real-valued weights that are updated in the backward pass and their signs, binary weights, that are employed in the feedforward process.
arXiv Detail & Related papers (2021-08-31T05:18:59Z)
n-hot: Efficient bit-level sparsity for powers-of-two neural network quantization [0.0]
Powers-of-two (PoT) quantization reduces the number of bit operations of deep neural networks on resource-constrained hardware. PoT quantization triggers a severe accuracy drop because of its limited representation ability. We propose an efficient PoT quantization scheme that balances accuracy and costs in a memory-efficient way.
arXiv Detail & Related papers (2021-03-22T10:13:12Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy [49.3037538647714]
We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ.
arXiv Detail & Related papers (2020-06-15T16:09:17Z)
Second-Order Provable Defenses against Adversarial Attacks [63.34032156196848]
We show that if the eigenvalues of the network are bounded, we can compute a certificate in the $l$ norm efficiently using convex optimization. We achieve certified accuracy of 5.78%, and 44.96%, and 43.19% on 2,59% and 4BP-based methods respectively.
arXiv Detail & Related papers (2020-06-01T05:55:18Z)
Dithered backprop: A sparse and quantized backpropagation algorithm for more efficient deep neural network training [18.27946970159625]
We propose a method for reducing the computational cost of backprop, which we named dithered backprop. We show that our method is fully compatible to state-of-the-art training methods that reduce the bit-precision of training down to 8-bits.
arXiv Detail & Related papers (2020-04-09T17:59:26Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.