Related papers: EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models

EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models

URL: http://arxiv.org/abs/2212.11803v1
Date: Thu, 22 Dec 2022 15:35:42 GMT
Title: EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models
Authors: Xinlin Li, Mariana Parazeres, Adam Oberman, Alireza Ghaffari, Masoud Asgharian, Vahid Partovi Nia
Abstract summary: EuclidNet is a compression method designed to be implemented on hardware which replaces multiplication. We show that EuclidNet is aligned with matrix multiplication and it can be used as a measure of similarity in case of convolutional layers.
Score: 2.715806580963474
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the advent of deep learning application on edge devices, researchers actively try to optimize their deployments on low-power and restricted memory devices. There are established compression method such as quantization, pruning, and architecture search that leverage commodity hardware. Apart from conventional compression algorithms, one may redesign the operations of deep learning models that lead to more efficient implementation. To this end, we propose EuclidNet, a compression method, designed to be implemented on hardware which replaces multiplication, $xw$, with Euclidean distance $(x-w)^2$. We show that EuclidNet is aligned with matrix multiplication and it can be used as a measure of similarity in case of convolutional layers. Furthermore, we show that under various transformations and noise scenarios, EuclidNet exhibits the same performance compared to the deep learning models designed with multiplication operations.

Related papers

Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain. We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance. We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z)
Ada-QPacknet -- adaptive pruning with bit width reduction as an efficient continual learning method without forgetting [0.8681331155356999]
In this work new architecture based approach Ada-QPacknet is described. It incorporates the pruning for extracting the sub-network for each task. Results show that proposed approach outperforms most of the CL strategies in task and class incremental scenarios.
arXiv Detail & Related papers (2023-08-14T12:17:11Z)
Automated Sizing and Training of Efficient Deep Autoencoders using Second Order Algorithms [0.46040036610482665]
We propose a multi-step training method for generalized linear classifiers. validation error is minimized by pruning of unnecessary inputs. desired outputs are improved via a method similar to the Ho-Kashyap rule.
arXiv Detail & Related papers (2023-08-11T16:48:31Z)
On a class of geodesically convex optimization problems solved via Euclidean MM methods [50.428784381385164]
We show how a difference of Euclidean convexization functions can be written as a difference of different types of problems in statistics and machine learning. Ultimately, we helps the broader broader the broader the broader the broader the work.
arXiv Detail & Related papers (2022-06-22T23:57:40Z)
Geometric Optimisation on Manifolds with Applications to Deep Learning [6.85316573653194]
We design and implement a Python library to help the non-expert using all these powerful tools. The algorithms implemented in this library have been designed with usability and GPU efficiency in mind.
arXiv Detail & Related papers (2022-03-09T15:20:07Z)
DCT-Former: Efficient Self-Attention with Discrete Cosine Transform [4.622165486890318]
An intrinsic limitation of the Trasformer architectures arises from the computation of the dot-product attention. Our idea takes inspiration from the world of lossy data compression (such as the JPEG algorithm) to derive an approximation of the attention module. An extensive section of experiments shows that our method takes up less memory for the same performance, while also drastically reducing inference time.
arXiv Detail & Related papers (2022-03-02T15:25:27Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
Learning to map between ferns with differentiable binary embedding networks [4.827284036182784]
We present a novel concept that enables the application of differentiable random ferns in end-to-end networks. It can then be used as multiplication-free convolutional layer alternative in deep network architectures.
arXiv Detail & Related papers (2020-05-26T08:13:23Z)
Neural Network Compression Framework for fast model inference [59.65531492759006]
We present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF) It leverages recent advances of various network compression methods and implements some of them, such as sparsity, quantization, and binarization. The framework can be used within the training samples, which are supplied with it, or as a standalone package that can be seamlessly integrated into the existing training code.
arXiv Detail & Related papers (2020-02-20T11:24:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.