Related papers: Pairwise Neural Networks (PairNets) with Low Memory for Fast On-Device Applications

Pairwise Neural Networks (PairNets) with Low Memory for Fast On-Device Applications

URL: http://arxiv.org/abs/2002.04458v1
Date: Mon, 10 Feb 2020 02:12:59 GMT
Title: Pairwise Neural Networks (PairNets) with Low Memory for Fast On-Device Applications
Authors: Luna M. Zhang
Abstract summary: A traditional artificial neural network (ANN) is normally trained slowly by a gradient descent algorithm, such as the backpropagation algorithm. We created a novel wide and shallow 4-layer ANN called "Pairwise Neural Network" ("PairNet") with high-speed non-gradient-descent hyper parameter optimization.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A traditional artificial neural network (ANN) is normally trained slowly by a gradient descent algorithm, such as the backpropagation algorithm, since a large number of hyperparameters of the ANN need to be fine-tuned with many training epochs. Since a large number of hyperparameters of a deep neural network, such as a convolutional neural network, occupy much memory, a memory-inefficient deep learning model is not ideal for real-time Internet of Things (IoT) applications on various devices, such as mobile phones. Thus, it is necessary to develop fast and memory-efficient Artificial Intelligence of Things (AIoT) systems for real-time on-device applications. We created a novel wide and shallow 4-layer ANN called "Pairwise Neural Network" ("PairNet") with high-speed non-gradient-descent hyperparameter optimization. The PairNet is trained quickly with only one epoch since its hyperparameters are directly optimized one-time via simply solving a system of linear equations by using the multivariate least squares fitting method. In addition, an n-input space is partitioned into many n-input data subspaces, and a local PairNet is built in a local n-input subspace. This divide-and-conquer approach can train the local PairNet using specific local features to improve model performance. Simulation results indicate that the three PairNets with incremental learning have smaller average prediction mean squared errors, and achieve much higher speeds than traditional ANNs. An important future work is to develop better and faster non-gradient-descent hyperparameter optimization algorithms to generate effective, fast, and memory-efficient PairNets with incremental learning on optimal subspaces for real-time AIoT on-device applications.

Related papers

Model-free front-to-end training of a large high performance laser neural network [0.0]
We demonstrate a fully autonomous and parallel optical neural network (ONN) using off-the-shelf components. Our ONN is highly efficient and is scalable both in network size and inference bandwidth towards the GHz range. We show that our ONN can achieve high accuracy and convergence efficiency, even under limited hardware resources.
arXiv Detail & Related papers (2025-03-21T08:43:02Z)
Scaling Studies for Efficient Parameter Search and Parallelism for Large Language Model Pre-training [2.875838666718042]
We focus on parallel and distributed machine learning algorithm development, specifically for optimizing the data processing and pre-training of a set of 5 encoder-decoder LLMs. We performed a fine-grained study to quantify the relationships between three ML methods, specifically exploring Microsoft DeepSpeed Zero Redundancy stages.
arXiv Detail & Related papers (2023-10-09T02:22:00Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Hyperparameter Optimization with Neural Network Pruning [6.193231258199234]
We propose a proxy model for a neural network (N_B) to be used for hyperparameter optimization. The proposed framework can reduce the amount of time up to 37%.
arXiv Detail & Related papers (2022-05-18T02:51:47Z)
FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs) Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs) It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously. This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z)
NL-CNN: A Resources-Constrained Deep Learning Model based on Nonlinear Convolution [0.0]
A novel convolution neural network model, abbreviated NL-CNN, is proposed, where nonlinear convolution is emulated in a cascade of convolution + nonlinearity layers. Performance evaluation for several widely known datasets is provided, showing several relevant features.
arXiv Detail & Related papers (2021-01-30T13:38:42Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
TASO: Time and Space Optimization for Memory-Constrained DNN Inference [5.023660118588569]
Convolutional neural networks (CNNs) are used in many embedded applications, from industrial robotics and automation systems to biometric identification on mobile devices. We propose an approach for ahead-of-time domain specific optimization of CNN models, based on an integer linear programming (ILP) for selecting primitive operations to implement convolutional layers.
arXiv Detail & Related papers (2020-05-21T15:08:06Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
PairNets: Novel Fast Shallow Artificial Neural Networks on Partitioned Subspaces [0.0]
We create a novel shallow 4-layer ANN called "Pairwise Neural Network" ("PairNet") A value of each input is partitioned into multiple intervals, and then an n-dimensional space is partitioned into M n-dimensional subspaces. M local PairNets are built in M partitioned local n-dimensional subspaces.
arXiv Detail & Related papers (2020-01-24T05:23:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.