Splitting Convolutional Neural Network Structures for Efficient
Inference
- URL: http://arxiv.org/abs/2002.03302v1
- Date: Sun, 9 Feb 2020 06:53:18 GMT
- Title: Splitting Convolutional Neural Network Structures for Efficient
Inference
- Authors: Emad MalekHosseini, Mohsen Hajabdollahi, Nader Karimi, Shadrokh
Samavi, Shahram Shirani
- Abstract summary: A new technique is proposed to split the network structure into small parts that consume lower memory than the original network.
The split approach has been tested on two well-known network structures of VGG16 and ResNet18 for the classification of CIFAR10 images.
- Score: 11.031841470875571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For convolutional neural networks (CNNs) that have a large volume of input
data, memory management becomes a major concern. Memory cost reduction can be
an effective way to deal with these problems that can be realized through
different techniques such as feature map pruning, input data splitting, etc.
Among various methods existing in this area of research, splitting the network
structure is an interesting research field, and there are a few works done in
this area. In this study, the problem of reducing memory utilization using
network structure splitting is addressed. A new technique is proposed to split
the network structure into small parts that consume lower memory than the
original network. The split parts can be processed almost separately, which
provides an essential role for better memory management. The split approach has
been tested on two well-known network structures of VGG16 and ResNet18 for the
classification of CIFAR10 images. Simulation results show that the splitting
method reduces both the number of computational operations as well as the
amount of memory consumption.
Related papers
- An Efficient Procedure for Computing Bayesian Network Structure Learning [0.9208007322096532]
We propose a globally optimal Bayesian network structure discovery algorithm based on a progressively leveled scoring approach.
Experimental results indicate that our method, when using only memory, not only reduces peak memory usage but also improves computational efficiency.
arXiv Detail & Related papers (2024-07-24T07:59:18Z) - Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors [4.95475852994362]
We propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks.
We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures.
arXiv Detail & Related papers (2024-07-16T15:55:38Z) - Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques.
PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - I-SPLIT: Deep Network Interpretability for Split Computing [11.652957867167098]
This work makes a substantial step in the field of split computing, i.e., how to split a deep neural network to host its early part on an embedded device and the rest on a server.
We show that not only the architecture of the layers does matter, but the importance of the neurons contained therein too.
arXiv Detail & Related papers (2022-09-23T14:26:56Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures.
We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels.
Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z) - MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated
Edge Inference [1.7894377200944507]
Machine learning networks can easily exceed available memory, increasing latency due to excessive OS swapping.
We propose a memory usage predictor coupled with a search algorithm to provide optimized fusing and tiling configurations.
Results show that our approach can run in less than half the memory, and with a speedup of up to 2.78 under severe memory constraints.
arXiv Detail & Related papers (2021-07-14T19:45:49Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - On Power Laws in Deep Ensembles [12.739425443572202]
We show that one large network may perform worse than an ensemble of several medium-size networks with the same total number of parameters.
Using the detected power law-like dependencies, we can predict the possible gain from the ensembling of networks with given structure.
arXiv Detail & Related papers (2020-07-16T17:35:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.